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Abstract 

The Artificial Compressibility Method (ACM) for the incompressible Navier- 
Stokes equations is (link-wise) reformulated (referred to as LW-ACM) by a finite set 
of discrete directions (links) on a regular Cartesian mesh, in analogy with the Lattice 
Boltzmann Method (LBM). The main advantage is the possibility of exploiting well 
established technologies originally developed for LBM and classical computational 
fluid dynamics, with special emphasis on finite differences (at least in the present 
paper), at the cost of minor changes. For instance, wall boundaries not aligned with 
the background Cartesian mesh can be taken into account by tracing the intersec- 
tions of each link with the wall (analogously to LBM technology). LW-ACM requires 
no high-order moments beyond hydrodynamics (often referred to as ghost moments) 
and no kinetic expansion. Like finite difference schemes, only standard Taylor ex- 
pansion is needed for analyzing consistency. Preliminary efforts towards optimal 
implementations have shown that LW-ACM is capable of similar computational 
speed as optimized (BGK-) LBM. In addition, the memory demand is significantly 
smaller than (BGK-) LBM. Importantly, with an efficient implementation, this al- 
gorithm may be one of the few which is compute-bound and not memory-bound. 
Two- and three-dimensional benchmarks are investigated, and an extensive com- 
parative study between the present approach and state of the art methods from the 
literature is carried out. Numerical evidences suggest that LW-ACM represents an 
excellent alternative in terms of simplicity, stability and accuracy. 
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1 Introduction 



Despite a large variety of mesh generation techniques for numerical solvers 
of the fluid dynamics governing equations [1] , addressing complex geometries 
remains a difficult duty. To this aim, several approaches were proposed for 
adapting computational grids to complex geometries by unstructured meshes. 
Generating unstructured meshes of high quality though, is a challenging com- 
putational task per se, which involves quite advanced algorithms (Rupperts 
algorithm. Chews second algorithm, Delaunay triangulation, etc.) llj. While 
those approaches simplify the treatment of boundaries, in turn, each of them 
introduces new difficulties such as extra terms in the equations, extra inter- 
polations, larger computational molecules, and problems associated with the 
transfer of information across grid interfaces. The added complexity makes 
code development even more difficult and increases computation time [2], with 
an additional risk that those algorithms may not lead to an acceptable solu- 
tion. 

An alternative approach which has attracted an increasing interest in recent 
years makes use of Cartesian grids for all cells with the exception of those that 
present intersections with boundaries, which are thus truncated according to 
the shape of the boundary surface. The advantages of Cartesian grids can be 
retained for all cells in the bulk fluid, and a special treatment is only reserved 
to boundary cells. On the contrary, cells fully outside the flow can be simply 
ignored during computations |2]. In the literature, this approach is typically 
referred to as the "embedded boundary method" , the "Cartesian grid method" 
or the "cut-cell method" [3l SI O El |7] . Clearly, the challenging point is to make 
the method accurate in dealing with curved and planar boundaries transversal 
to the grid, even though such boundaries are conveniently approximated in a 
staircase fashion. More specifically, after determining the intersection between 
the Cartesian grid and a boundary, cells whose center lies in the fluid are 
reshaped by discarding their part belonging to the solid wall, while pieces of 
cut cells with the center in the solid are absorbed by neighboring cells [5] . This 
results in the formation of control-volumes which are trapezoidal in shape. 

Classical approaches to the incompressible limit of Navier-Stokes equations re- 
quire (a) dedicated techniques for solving a pressure Poisson equation in order 
to take advantage of the underlying structured nature of the mesh and thus 
speed-up convergence [3]. Moreover, (b) compact multi-dimensional polyno- 
mial interpolating functions are used for obtaining a second-order accurate 
approximation of the fluxes and gradients on the faces of the trapezoidal 
boundary cells from available neighboring cell-center values [5]. Recent de- 
velopments to this also follows a similar approach [6l [7]. 

Both (a) the need of a dedicated solver for the pressure Poisson equation and 



4 



(b) the use of compact multi-dimensional interpolations, can be overcome by 
the lattice Boltzmann method (LBM) 0, while preserving the main features 
of the Cartesian cut-cell method for mesh generation and boundary treatment. 
However, this comes at a price of dealing with specific features inherited from 
the kinetic theory of gases, which are unessential as far as the continuum 
description of incompressible Navier-Stokes equations is the only concern. 

For this reason, we propose a novel formulation of the artificial compressibility 
method (ACM), which retains the convenient features of LBM, namely (a) 
the artificial compressibility and (b) the link-wise formulation based on the 
theory of characteristics, but concurrently gets rid of unessential heritages of 
the kinetic theory of gases. 

Similarities between LBM and ACM [TU] are sometimes reminded in the lit- 
erature. It is well known indeed that the Chapman-Enskog expansion of the 
LBM updating rule delivers the governing equation of ACM: the artificial 
compressibility equations (ACE). The latter consist of the same momentum 
equations as the incompressible Navier-Stokes equations (INSE), in addition 
to an artificial continuity equation including pressure time derivative. ACE 
can be also recovered by the more systematic expansion such as the Hilbert 
method under diffusive scahng [11]. 

The lattice kinetic scheme (LKS) [12] (a variant of LBM) also shows similar- 
ities with ACM at the level of computer programming, despite the fact that 
the former deals with distribution functions of gas molecules, while the latter 
only with hydrodynamic (macroscopic) variables. 

For a special value of the relaxation parameter in the LBM updating rule, 
an updated value of the distribution function depends only on the previous 
equilibrium function at an arbitrary mesh point in the stencil. Since equilibria 
are in turn function of macroscopic variables only, the LKS updating rule 
can be immediately recognized as a kind of finite difference scheme, acting on 
hydrodynamic variables. As a result, the moment system of LKS delivers a 
variant of ACM. Recently, taking advantage of the similarities between LBM 
and ACM, the latter was reformulated as a high order accurate numerical 
method (fourth order in space and second order in time) [13] . 

Motivated by the common belief that an important reason of success of the 
LBM (in particular MRT-LBM [13]) is its remarkable robustness for simulating 
the various complex flows, the stability of the revived ACM has been further 
enhanced [15] . 

In this paper, in an attempt of making ACM even more similar to LBM, we 
propose yet a new formulation of ACM referred to as link-wise ACM (LW- 
ACM) in the following text. For the sake of completeness, we summarize both 
the main features of the revived ACM [THl US] , still valid for the present LW- 
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ACM, and the additional advantages due to a link-wise formulation. 



(1) ACM deals with macroscopic variables only, thus offering the opportu- 
nity of exploiting all the pre-existing finite-difference (FD) technologies: 
This is, for instance, a clear advantage when imposing inlet and outlet 
boundary conditions. On the contrary, LBM needs to account for nonhy- 
drodynamic quantities (sometimes called ghost quantities) which, though 
may not have direct impact on the hydrodynamic behavior, they can 
still be responsible of numerical instabilities [H]. Unfortunately, owing 
to nonlinearities, there are no clear and general recipes yet, on how to 
optimally design ghost quantities with desired stability properties. As far 
as the popular compact stencils are concerned, such as D2Q9, D3Q15 
and D3Q19 [9J with no special corrections, LBM ghost quantities remain 
numerical artifacts: Positive effects of such quantities for enhancing sta- 
bility of usual FD schemes are still far from being clearly demonstrated. 
ACM fully overcomes this issue, focusing instead on the minimum set of 
information for incompressible fluid dynamics. 

(2) Similarly to LBM, ACM posses the ability of computing transient solu- 
tions of incompressible Navier-Stokes equations (INSE), without resorting 
to a Poisson equation for pressure. The underlying idea, directly inspired 
by the asymptotic analysis of LBM schemes, is to multiply the pressure 
time derivative of artificial continuity equation by a mesh-dependent pa- 
rameter. In this way, the numerical Mach number, which is a mere numer- 
ical artifact for INSE (rigorously valid in the limit of vanishing Mach num- 
ber) is linked to the mesh spacing. Higher accuracy than LBM schemes 
can be also achieved by exploiting the asymptotic behavior of the solution 
of the artificial compressibility equations for small Mach numbers [13] . 

(3) ACM can use different meshing techniques. For example, it is possible to 
use simple lattice structures, namely Cartesian structured meshes, even- 
tually recursively refined like those also used by LBM, or it can be even 
formulated in a finite- volume fashion including unstructured body-fitted 
meshes. In the latter case, the same comments discussed at the beginning 
of this section about the computational overhead for generating unstruc- 
tured meshes hold as well. On the other hand, adopting simple lattice 
structures is not so simple as in LBM: (a) the boundary conditions must 
discriminate all possible orientations of the wall with regards to the lat- 
tice structure by a look-up table made of discrete cases, and (b) the 
complex wall treatment depends on the dimensionality of the problem 
(namely the look-up table for 2D is different from the one for 3D). Both 
previous problems can be overcome by LBM thanks to the link-wise for- 
mulation. A "link" is a generic direction identified by a discrete velocity 
of the lattice and coincides with one of the characteristics along which 
advection is performed (consistently with the method of characteristics - 
MOC). Such a numerical scheme based on a finite set of links can cope 
with a complex boundaries by (a) identifying the intersections of each 
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link with the wall and (b) updating the variable corresponding to such 
a link by a local rule. The local rule is always the same (no need for 
look-up tables) and the intersections can be computed once for all during 
pre-processing. The previous procedure easily applies to any orientations 
of the wall with respect to the lattice, as well as to any dimensions. In 
this paper, the above advantages of LBM are made available to ACM. 
(4) ACM deals with the minimum number of fields describing incompressible 
fluid dynamics: D+1, where D is the physical dimension of the problem. 
On the other hand, LBM deals with discrete distribution functions fi, 
which are as many as the lattice velocities Q. LBM has thus a memory 
overhead due to: D+1<Q. Between these two sets of variables, there is 
a simple connection: Local equilibria f^'^^ (Q variables) can be computed 
by means of macroscopic quantities only (D-l-1 variables). Introducing 
a larger set of variables fj;'^^ may seem a redundant and useless arti- 
fact. However, this work aims at demonstrating that formulating ACM 
in terms of f^^^ offers advantages as well. In particular, as far as the updat- 
ing rule of the algorithm is similar to LBM, it is possible (eventually with 
minor changes) to take advantage of most of LBM technology. For exam- 
ple, link-wise ACM can also be formulated in terms of local equilibrium 
f^^^ and this enables a convenient treatment of complex moving bound- 
aries typical of the LBM (see next). In conclusion, link-wise ACM has 
two possible (and fully equivalent) formulations: (a) in terms of macro- 
scopic variables like standard ACM (capable of exploiting pre-existing FD 
technology) and (b) in terms of local equilibrium (capable of exploiting 
pre-existing LBM technology). 



This paper is organized in sections as follow. The link- wise artificial compress- 
ibility algorithm for incompressible isothermal fluid dynamics is introduced In 
Section 2.1, where some classical benchmarks are presented (isothermal Cou- 
ette flow, generalized Green- Taylor vortex flow and Minion & Brown flow) as 
well. In Section 2^ the link-wise wall boundary conditions are discussed, in- 
cluding moving and complex walls, and some numerical tests are presented (2D 
lid driven cavity flow, 3D diagonally driven cavity flow and Circular Couette 
flow). Finally, Section Is] reports some concluding remarks. 



2 Link-wise Artificial Compressibility Method 



2.1 The main algorithm: Link-wise formulation 



Link-wise re-formulation of the Artiflcial Compressibility Method (ACM) for 
the incompressible isothermal fluid dynamics yields the following system of 
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algebraic equations 



/.(x, i+l) = ft\± - v., t) + 2 (^) {ft'''\±, i) - ft'''\± - v„ i)) , 

i = 0,...,Q- 1 

where Q is the number of lattice velocities. Full derivation of the equations 
([T| is provided in the Appendix |B] The fictitious variables are defined at a 
discrete setp] of spatial points x = x/Aa;, where Ax is the dimensionless mesh 
spacing, Ax = Ax' /L, with Ax' the mesh spacing in physical units and L the 
characteristic length scale of the flow field. Similarly, time levels are defined 
as t = t/At, where At is the dimensionless time step. At = At'/{L/U), with 
At' the time step in physical units and U a characteristic flow speed. 

The Q lattice velocities Vj are defined according to the considered scheme [9] . 
All points X form a regular lattice such that x — Vj belongs to the lattice, 
regardless of x and Vj. The quantities f- '^^ are local functions of density p = 
J2i fi and momentum pu = J2i Vj/j computed at x and t, namely fj;^^ = 
fi'^\p, u) (see Appendix A for some examples). 

The quantities f- ^^ are designed in order to recover the incompressible isother- 
mal fluid dynamics [D]. In particular, recovering incompressible Euler equations 
requires that J2i ft^ = P and J2i ^ift^ = pu, i.e. conservation of hydrody- 
namic moments, and J2i'^i^ift^ = n^^^ = puu + pi, with p function of 
density only (isothermal case): p = p{p). Further constraints can be found by 
asymptotic analysis (see Appendix [C] for details). However consistency leaves 
some degrees of freedom in designing these functions, which can be used for 



improving stability (one possible strategy is discussed in Appendix D ) 



On the other hand, the quantities //'^'"^ which are the odd parts of equilibria, 
are defined as 

fl''"\p.^) = \{fl'\p.^)-fl'\p.-^))- (2) 

A clear advantage of the above scheme is that all quantities appearing in Eqs. 
([T| and ([2]) only depends on known (equilibrium) functions at a mesh node and 
its close neighbors. This introduces a significant simplification in the treatment 
of boundary conditions, which can be directly borrowed from finite-difference 
technology (see e.g. the isothermal Couette flow test case reported in Section 



2.5.1). 



Similarly to LBM, the algebraic equations ([T| can be implemented in three 



^ If not evident otherwise, we use "hat" notation for lattice quantities expressed by 
means of integer values and "prime" notation for quantities expressed in physical 
units. 
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subsequent steps ("pull" formulation), namely pre-combining, streaming and 
post-combining, 

/;(x - v„ i) = ft\^ - v„ i) - 2 ft\± - v„ t), (3a) 

/r(x,t + l) = /;(x-v„t), (3b) 
/,(x,t+ 1) = /-(x,t + 1) + 2 ft'"\^,i)- (3c) 

Pre- and post-combining are local processes involving arithmetic operators, 
whereas streaming alone takes care of data exchange among the nearest neigh- 
bors of an arbitrary cell. 

The implementation strategy given by Eqs. ^ admits a straightforward in- 
clusion of external forcing, by considering the additional step 

/f"-(x, t + 1) = /,(x, £ + 1) + /f (p(x, t), g(x, i)) , (4) 

where g = {gx,gy)'^ is the external acceleration. The previous correction is 
local: Similarly to finite-difference schemes, the external forcing is applied 
to the point where it is supposed to act. The functions fl^'""* are used for 
convenience (they are already known) , for ensuring that the force only applies 
to the momentum equations. 

We notice that, the same simple procedure cannot be applied to the lattice 
Boltzmann method, because a correction to the distribution function may af- 
fect the dynamics of the higher order moments as well. Consistent treatment of 
the forcing typically involves some special (non-trivial) techniques [12] . Details 
on the effects due to the correction (|4]), by asymptotic analysis, are reported 
in the Appendix [C| Imposing a given physical acceleration g within a flow, 
requires the tuning of numerical acceleration g, namely g = g (in case of dif- 
fusive scaling). The same approach can be adopted to implement mass sources 
in the numerical scheme. 



2.2 The main algorithm: Finite difference formulation 

Since the right hand side of Eq. ([T]) only depends on the equilibrium condition, 
which is in turn a function of the macroscopic quantities, it is possible to pro- 
vide a finite difference formula, expressed in terms of macroscopic quantities, 
which is fully equivalent to ([T]) . As commonly done in the finite-difference liter- 
ature, we denote by {P} the set of computational points surrounding a generic 
point P (otherwise stated, the generic computational stencil). All the quanti- 
ties are intended computed at the generic time level t, while the superscript 
denotes a quantity at the next time level t + 1. The unknown quantities 
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are given by the velocity components u = {u,v)'^ and the pressure p. Hence 
the equivalent finite-difference formulas must provide a way to compute Up, 
Vp and pp namely 

Up = fu {U{P} , V{py , p{py ; , (5a) 

= fv {u{p} , V{P} , p{p} ; w) , (5b) 
Pp = fp {u{P} , V{P} , p{p} ; . (5c) 

See Appendix |E] for a complete example based on the D2Q9 lattice |9j. The 
same finite-difference counterpart can be found for the Lattice Kinetic Scheme 
(LKS) [12], recovered in case a; = 1, but Eq. ([T]) is also valid for tunable u 
and consequently tunable viscosity u (in particular, for high Reynolds number 
flows). Moreover, the same derivation can be done for the FD-LKSz/ proposed 
in ^20j , but Eq. ([T]) is formulated only along a particular lattice link and hence 
it can also take advantage of most of LBM technology (which is link- wise). 

(1) Availability of two alternative formulations of the same numerical scheme 
is very convenient. For example, it is possible to commute (even dynam- 
ically) between the formulation based on Eq. ([T]) and that based on Eqs. 
(|5|, depending on the best option in dealing with the local boundary 
conditions. 

(2) Similarly to conventional ACM j20t [T3| [T5] , the formulation based on Eqs. 
(|5| can be improved by introducing a semi-implicit step for updating the 
pressure fleld. Essentially the step in Eq. (fsb) can be substituted by 



Pp 



fp{u{P},V{py,P{py,uj) , (6) 



using the already updated velocity fleld. 
(3) The flnite-difference formulation allows to choose different relaxation pa- 
rameters u for different (macroscopic) equations. Let us deflne Uu the re- 
laxation frequency used in Eq. (|5^) and (|5|d), while let us deflne Up ^ Uu 
the relaxation frequency used in Eq. ^jp). Consequently two kinematic 
viscosities follow, namely z/ = u^u) and z/p = //(wp), where the function 
u = u^co) is given by Eq. ([T]). By introducing these relaxation frequen- 
cies in Eqs. ^ and applying Taylor expansion to such novel expressions 
(see Appendix [C] for details), the equivalent system of macroscopic equa- 
tions can be recovered in the continuum limit (e — > 0). The momen- 
tum equation involves the kinematic viscosity u (as previously), while 
the pseudo-compressibility term in the artiflcially compressible continu- 



ity equation, namely the flrst term in Eq. (C.16), becomes proportional 
to e^/z/p. Hence, for high Reynold number flows, it is possible to realize 
z/ ^ 1, while z/p ~ 1 with a more accurate fulflUment of the diverge-free 
condition for the velocity fleld. 
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2.3 Physical and lattice units 



Table 1 

Conversion table from lattice to physical units in case of diffusive scaling, i.e. Ax = e 
and At = with e = Ax/L = 1/N and N the number of mesh points along one 
axis. Quantities in lattice units are readily computed in the code, but they are 
mesh-dependent. Corresponding quantities in physical units are mesh- independent 
and can be computed during post-processing. In the text, in case of ambiguity, 
quantities in physical units are denoted by over-bar. 



Quantity 



Lattice units 



Physical units 



Pressure 
Velocity 



Force by Eqs. (23, 24|) 
Torque by Eqs. (34 
Temperature 



P = l/3 EJ. 



35) r = E: 



eS 



X 



X Pi 



u = u/e 

A^= Jo)A' 
t = T 
T = T 



Let us assume Ax = e and At = (diffusive scaling), with e = Ax/L = 1/N 
and N the number of mesh points. As reported in the Appendix [C} asymptotic 
analysis [11] of ([TJ and ^ shows that, in the limit of vanishing grid spacing, 
e ^ 1, the quantities p = {p — Po)/^^ = {p' ~ P'oJ/U"^ u = u/e = u' /U 
satisfy the incompressible isothermal Navier-Stokes equations, with viscosity 



3 \uj 



2l- 



(7) 



where the subscript denotes mean value over the whole computational do- 
main. Here, we stress that for a correct use of the proposed algorithm, it is 
important to consider a proper post-processing of the numerical results. To 
this respect, we notice that, for instance, the velocity field u = EiVj/i/Ei/i 
is mesh-dependent: u = u(e), with u going to zero as the mesh spacing e 
vanishes. As a result, u is not the proper choice for describing the velocity 
field of incompressible flows: To this aim, instead, the quantity u = u/e is 
adopted due to mesh-independence. Similar considerations apply also to other 
fields. For consistency with the LBM literature, in this work, the units of 
quantities directly based on fi are referred to lattice units, while units of the 
corresponding mesh-independent quantities are termed physical units. 



For the sake of clarity, the complete set of formulas for converting units of 
all relevant variables are reported in Table [1} We stress that finite-difference 
formulation of the proposed method can be carried out directly in physical 
units, thus avoiding the above post-processing. Nevertheless, here we prefer 
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to keep the above post-processing for consistency with the Lattice Boltzmann 
community. 



Finally, similarly to LBM, accurate solution of INSE requires e^/ z/ ^ 1 (see 
Appendix[C]for details and the following discussion about the Minion & Brown 
flow in Section 2.5.3). Numerical evidences suggest that LW-ACM is stable 
for 1 < a; < 2, which corresponds only to half of the stability range for the 
relaxation frequency of the LBM. However, this is the most interesting range 
from the practical point of view, since it corresponds to the limit of vanishing 
viscosity (high Reynolds numbers). 



2.4 Optimized computer implementation 



In the following, we discuss a few strategies useful for reducing the compu- 
tational time, thanks to an optimized implementation of the algorithm ([s]) 
and ([5]). Similarly to LBM, the performance characteristics of single-processor 
implementations depends on the effect of different data layouts [TTj (multi- 
processor optimization strategies are not considered in the present work, see 
Ref. ^). 

First of all, the streaming step should not overwrite data required for updating 
neighboring sites. A usual way to work around the resulting data dependencies 
owing to the propagation step is the use of two arrays (one for the current and 
the other for the next time step), and toggling between them [17]. It is also 
possible (and even more efficient) using a single array, with proper ordering 
of the sequence of streamed lattice directions, though this may become cum- 
bersome when dealing with wall boundary conditions. The best data layout 
requires that the distribution functions of the current cell are contiguously 
located in memory (e.g. by using the first index in Fortran, which addresses 
consecutive memory locations due to column major order |17j). 

Secondly, to reduce the memory traffic, it is important that pre-combining, 
streaming and post-combining are executed in a single loop and not indepen- 
dently of each other in separate loops or routines, similarly to LBM [T7] . This 
goal can be easily accomplished by reformulating Eqs. ([s]) in term of /*: In 
fact, the hydrodynamic moments of /* are not exactly equal to the hydrody- 
namic quantities, but the former are known functions of the latter. Hence it 
is convenient to compute directly /*, which are ready to be streamed, and to 
extract the hydrodynamic quantities from /*. This simplifies the implemen- 
tation of a single updating loop through all computational sites at each time 
step. 

Finally, it is important to reduce as much as possible the number of fioating 
point operations and memory accesses per updated site. In LBM, the D3Q19 
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lattice (see Appendix |A] for details) with BGK collision operator requires about 
180-200 floating point operations per cell and time step as well as reading 
19 floating point values and writing to 19 different memory locations [17] . 
Roughly half of the floating point operations and half of the memory accesses 
are required by the D2Q9 lattice. 

Let us considering the D2Q9 lattice and the LBM-style formulation of LW- 
ACM, as dictated by Eqs. ([3]), with the optimization tricks reviewed above. 
Such an implementation of LW-ACM requires 115 floating point operations 
(+28% compared to BGK-LBM) and 26 memory accesses (+44%) per cell. 
However, in the following, only the computational performances of LW-ACM 
in the FD formulation are further investigated, and its superior capabilities are 
demonstrated (compared to BGK-LBM). From a computational point of view. 

Table 2 

Performance test of the link-wise ACM (by FD formulation) vs. BGK-LBM, based 
on the Minion & Brown flow [22] with Re = 10, 000 in the time range t G [0, 1], solved 
by a mesh with 512 x 512 nodes/sites and performing 12,800 iterations (Ma = 0.04). 
Both codes are serial and use double precision. The considered workstation has 
Intel® Core'^^ i7-920 (Bloomfield, 4 physical cores, 8MB L3) with clock rate 
2.67GHz (due to TurboMode'^^ actually running at 2.80 GHz) and 12 GB of DDR3 
memory (1333 MHz). The used Fortran compiler is Intel(R)version ll.lup8 (opti- 
mization level option "-03") and the operative system is Ubuntu Linux il0.04 LTS 
(64 bit). The million fluid lattice cell updates per second (MLUPS) are reported for 
both methods. 



Elementary stencil 


Link-wise ACM by FD formulation 


BGK-LBM 


# of additions/subtractions 


80 


70 


# of multiplications 


60 


40 


# of floating point operations 


140 


110 


# of actual data (t) 


27 


9 


# of updated data {t + 1) 


3 


9 


external size of the stencil 


3x3x3 


3x3x9 


MLUPS 


29.43 


27.28 



it may appear that formulas ([5]) are not suitable for an efficient implementa- 
tion, since they involve many floating point operations. However, because they 
are derived from Eq. ([T]), it is possible to simplify them using (by-hand) com- 
mon subexpression elimination (CSE) [18]. See Appendix [E| for a complete 
example based on the D2Q9 lattice [H]. Moreover, the same implementation 
tricks discussed above can be properly applied here. 

First of all, the memory storage required by link-wise ACM is exactly one third 
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of that of BGK-LBM (only hydrodynamic variables are needed). At each time 
step, it is enough to go through all computational cells/sites once and this can 
be done straightforwardly, because updating formulas are already expressed in 
terms of hydrodynamic variables. Finally, for locating the macroscopic quan- 
tities {p, u, v) contiguously in memory, it is possible to collect them in a single 
array and to use the first index for addressing them. This leads to an opti- 
mized FD-style implementation of Eqs. ([s]). On the D2Q9 lattice, a comparison 
between the FD-style implementation of link-wise ACM and BGK-LBM is re- 
ported in Table [2j Link- wise ACM requires more floating point operations but 
less memory accesses than BGK-LBM. 

For clarity, let us analyze the updating process at each time. The external size 
of the stencil of LW-ACM is smaller than the one of LBM (3x3x3 instead of 
3x3x9 respectively, where 3x3 is due to the D2Q9 lattice, 3 is the number of 
hydrodynamic quantities and 9 the number of discrete distribution functions). 
During the generic updating process, if the cache is large enough to hold 
3 "lines" (or 3 planes in 3D) of the computational domain, then updating 
the hydrodynamic quantities in a cell requires the loading of only the actual 
values of a further cell in the cache (3 loads). In the worst case, updating the 
hydrodynamic quantities in a cell requires the loading of the actual values of 
three additional cells in the cache: This amounts to 9 loads from actual array 
(3 physical quantities from the current, previous and next "line"). In any case, 
3 write-allocate transfers from main memory to cache and 3 stores to get the 
updated array from cache back to main memory are always required. This 
leads to 9-15 in total per nodal updates. 

On the other hand, BGK-LBM requires 9 loads from the actual array (discrete 
distribution function), 9 write-allocate transfers and 9 stores to the updated 
array, leading to 27 memory transfers. For BGK-LBM, there is no reuse of data 
from cache because every discrete distribution function is only used once. As 
the number of memory transfers usually affects the performance more than 
the number of floating point operations, the performance of link-wise ACM is 
superior than that of BGK-LBM. Some performance data are reported in Table 
[2j FD-style implementation of link- wise ACM was able to achieve 29.43 million 
fluid lattice cell updates per second (MLUPS), which is the standard way to 
measure the performance of LBM implementations [I7j. For the previous test, 
this value is roughly 8% faster than BGK-LBM. 

Remark: In our opinion, there is still room for improvement according to the 
performance model (based on assuming either infinitely fast memory or in- 
finitely fast compute units). For example, the numerical code for solving the 
2D lid driven cavity test case achieved 32.3 MLUPS and this was essential for 
simulating very high Reynolds number flows. Importantly, with an efficient im- 
plementation, this algorithm may be one of the few which is compute-bound 
and not memory-bound. The latter observation is of particular interest for 
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General-Purpose computing on Graphics Processing Units (GPGPU). 
2.5 Simple boundary conditions 

For sake of simplicity, numerical tests with simple boundary conditions are 
discussed first. Here, by simple boundary conditions, we mean either finite- 
difference boundary conditions (isothermal Couette flow) or periodic (gen- 
eralized Green- Taylor vortex flow and Minion & Brown flow). More general 
boundary conditions taking advantage of the LBM technology will be discussed 
in Section |2?6| 

2. 5. 1 Isothermal Couette flow 

In this section, we consider the plane Couette flow where a viscous fluid is 
confined in a gap between two parallel plates, with the one moving in its own 
plane with respect to the other. Here, two configurations are simulated by 
the present LW-ACM method on several meshes: Couette flow without wall 
injection (referred to as Test 1), and Couette flow with wall injection (referred 
to as Test 2). In the latter configuration (Test 2) fluid is injected from the 
bottom wall into the gap and extracted from the top wall with a constant 
orthogonal velocity Vq. At the stationary condition, the above configurations 
admit the following exact solutions: 

u{y) = UiL (l + f ) + ^0 - ' (Test 1), (8) 

where the Reynolds number Re is the main control parameter, L is a character- 
istic length depending on the considered test and v is the kinematic viscosity. 

For Test 1, L is half the gap height, while uq represents a velocity based on 
the imposed pressure gradient Vp: 

2pz/ 

Mi is the velocity of the top wall ul = u{y = L) and p is the density. The 
bottom wall is assumed stationary: u{y = —L) = 0. 

In case of wall injection (Test 2), L is the gap height, the Reynolds number Re 
in ([s]) is defined on the basis of the injection velocity Vq, namely Re = VqL/u. 
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Velocities at the top and bottom walls are: u{y = L) = ul and u{y = 0) = 0, 
respectively. In all simulations, no-slip boundary conditions are applied along 
the wall by simply imposing the local equilibrium with the desired velocity, 
while periodic boundary conditions are adopted at the inlet and outlet. 

Table 3 

Convergence analysis for Couette flow without (Test 1) and with (Test 2) wall 
injection in case of both diffusive and acoustic scaling. 





At oc Ax^ 


(diffusive scaling) 












Error L'^[u\ 




e = Ax 


Ma oc At /Ax 
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At oc Ax 


(acoustic scaling) 












Error L'^[u\ 




e = Ax 


Ma oc At /Ax = const. 
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Test 1 


Test 2 




1/10 


3.0 X 10-^ 


3.0 X 10-3 


4.27 X 10-2 


1.05 X 10" 


-2 


1/20 


3.0 X 10-^ 


1.5 X 10-3 


2.76 X 10-2 


5.19 X 10" 


-3 


1/40 


3.0 X 10-1 


7.5 X 10-^ 


1.54 X 10-2 


2.66 X 10" 


-3 



First of all, diffusive scaling is considered: This strategy consists in scaling 
the velocity field on different meshes, keeping fixed the relaxation frequency 
(see Appendix [C] for details). This scaling ensures second order convergence 
in the accuracy, as reported in upper part of Table [sj where the norm of 
deviation of numerical results from exact solution ^ is shown. In addition, 
acoustic scaling is considered. This strategy consists in tuning the relaxation 
frequency on different meshes in order to keep constant the computed velocity 
field (see Appendix [C] for details). This scaling ensures first order convergence 
in the accuracy, as reported in the lower part of Table [3] 

2.5.2 Generalized Green- Taylor vortex flow 

In this section, some numerical results of the Taylor-Green vortex flow are 
reported. The latter problem is widely employed as a benchmark for vari- 
ous incompressible Navier-Stokes solvers, owing to the existence of a simple 
analytical solution. The original problem is characterized by the exponential 
decay in time due to viscous dissipation. However, here the original problem is 
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modified such that it becomes periodic in time by the introduction of a proper 
external acceleration g = {gxidyY^ to Eq. (C.12), where 



g^it.x.y) = sin(x - Mot) cos{y - fot)(2z/cost - sint), 
gy{t, x,y) = — cos{x — uqI) sin(?/ — VQt){2i' cost — sint), 

with uq and Vq being constants aiming at preventing that the advection term 
balances with the pressure gradient. The modified problem admits the follow- 
ing analytical solution 

v{t,x,y) 
p{t,x,y) 

We solve the above modified problem numerically in the domain ^2 = [0 < 
X < 27r] X [0 < y < 27i] using periodic boundary condition, and {uo,Vo) = 
(0.3,0.6). The kinematic viscosity is u = 0.1. Diffusive scaling is adopted for 
all simulations, namely Ma = 271 Kn or equivalently At = Ax^ (see Appendix 
|C|for details). 



Uq + sin{x — Mot) cos{y — vqI) cost, 
vq — cos(x — Mot) sin(?/ — v^t) cost, 

-[cos2(x — Mot) + cos2(?/ — Mot)] cos^t. 



(11) 



The error data are reported in Table |4] at t = 60 for link-wise ACM, link- 
wise ACM with semi-implicit formulation (see the end of Section 2.1 where 
this formulation is presented), standard (second-order) ACM and multiple- 
relaxation-time (MRT) LBM. The standard ACM is described in Ref. [T3] . 
In the MRT-LBM [HI [21], the consistent treatment of forcing is based on 
Ref. [19]. The time step employed in the LBM computation is the same as 
the one of ACM, i.e. Ma = 27rKn or equivalently At = e^, and the tuning 
parameters of MRT are Si = S4 = S6(= Tp~^ = Tj~^) = 0, S2{= Te~^) = 
1.63, S3(= rr^) = 1.14, S5 = sy{= Tq-^) = 1.92 (see Refs. [211 EH]). All 
methods show nearly second-order convergence. Link-wise ACM shows larger 
numerical errors than ACM and MRT-LBM. However it must be stressed 
that the implementation of forcing in link-wise ACM is straightforward, while 
the consistent treatment of forcing in LBM is much more complicated [1^. 
Moreover in link-wise ACM, spatial operators (gradient and Laplace operator) 
do not need to be discretized individually (unlike ACM) but it is enough 
to discretize along the generic lattice direction: This makes much easier the 
treatment of three-dimensional cases. 



2. 5. 3 Minion & Brown flow 

LW-ACM and LBM are characterized by different values of artificial com- 
pressibility, leading to a different robustness with respect to under-resolved 
simulations. By referring to the pseudo-continuity equation for LW-ACM (see 
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Table 4 

The norm of the error versus e = Ax = Ma/27r at t = 60 in the problem of the 
generalized Taylor-Green vortex problem for v = 0.1. 



Link-wise ACM 



e = Ax Error [u] Error [ 



Error L^{p\ 



t = 11/1 iUi ±J [UJ 11/1 lUi Iv [1/J J 

7r/16 1.69262 x 10"^ 1.95356 x 10"^ 3.45590 x 10"^ 

7r/32 4.38763 x lO^^ 4.70793 x lO^^ 7.91104 x lO^^ 

7r/64 1.41952 x lO'^ 1.50330 x lO^^ 



Link-wise ACM (semi-implicit, see Section 



2.1 



Ax Error L^{u] Error L^[v\ Error L^{p\ 



7r/16 1, 
7r/32 4, 
7r/64 1, 



70353 X 10 
41348 X 10 
41906 X 10 



-3 



-3 



2.02203 X 10-2 
4.70412 X 10-3 
1.50227 X 10-3 



3.09165 X 10-2 
6.11065 X 10-3 
1.73786 X 10-3 



ACM 



e = Ax 

7r/16 8 

7r/32 1 

7r/64 5 



Error L}[u] Error L^[v\ 

03750 X 10- 
,92844 X 10" 
64682 X 10- 



1-3 



1-3 



1.00313 X 10-2 
2.47186 X 10-3 
6.61285 X 10-"^ 



Error L^\p\ 
9.70682 X 10-3 
1.93893 X 10-3 
4.34911 X 10-"^ 



MRT-LBM 



Ax Error L^{u\ Error L^[v\ Error L^{p\ 



7r/16 7. 
7r/32 1. 
7r/64 6. 



67964 X 10 
84928 X 10 
12810 X 10 



-3 



-4 



8.90307 X 10-3 
2.17164 X 10-3 
6.81060 X 10-"^ 



1.67526 X 10-2 
3.63345 X 10-3 
1.11633 X 10-3 



Eq. (C.16) in the AppendixjCj), it follows that accurate solution of divergence- 
free condition for the velocity field requires: e^/v ^ 1. This is even a more 
severe condition than the one of LBM, in case of high Reynolds number flows. 
In the following, we further explore this issue, by means of the Minion & 
Brown flow. 



Minion & Brown [22] studied the performance of various numerical schemes 
for under-resolved simulations of the 2D incompressible NavierStokes equa- 
tions. The relevance of this flow for testing robustness and accuracy of LBM 
schemes was first pointed out by Dellar [23j. Minion & Brown considered initial 
conditions corresponding to the perturbed shear layer 
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tanh(fc(|/-l/4)), y<l/2, 
u{t,x,y) = { (12) 
tanh(A;(3/4-t/)), y > 1/2, 

v{t,x,y)=6sm{2n{x + (13) 

in the periodic domain = [0 < x < 1] x [0 < y < 1]. The parameter k controls 
the shear layer width, while 6 the magnitude of the initial perturbation. The 
shear layer is expected to roll up due to Kelvin-Helmholtz instability. With 
k = 80, S = 0.05, and Reynolds number Re = l/u = 10000, the thinning 
shear layer between the two rolling up vortices becomes under-resolved on 
a 128 X 128 grid. Minion & Brown [22] found that conventional numerical 
schemes using centered differences became unstable for this under-resolved 
flow, whereas "robust" or "upwind" schemes that actively suppress grid-scale 
oscillations all produce two spurious secondary vortices at the thinnest points 
of the two shear layers at t = 1. Dellar [23] found that, even though it is stable 
on the previous mesh, two spurious vortices are generated by the BGK LBM 
scheme based on the D2Q9 lattice and with unmodified bulk viscosity. The 
same author proposed a way to increase the bulk viscosity for overcoming this 
problem, and verified that the same result can be achieved by using MRT- 
LBM. 

Link- wise ACM is stable for the previous test, but the velocity field is not 
accurate enough due to numerical viscosity. For e = 1/128 and u = 1/10000, 
indeed, the term that multiply the pressure time derivative in the pseudo- 



continuity equation (Eq. (C.16) in the Appendix [C|, namely e'^/u, reaches 
the value of 0.61, thus preventing an accurate fulfillment of the diverge-free 
condition for the velocity field. In the following, two improvements are worked 
out for circumventing the above problem. 



First, we apply the strategy discussed at the end of Section 2.2[ introducing 



a numerical fictitious viscosity z/p = 170 u, with the corresponding relaxation 
frequency Up computed by means of ^ . The relaxation frequency Up is used in 
the macroscopic updating rule (Isb). Hence, the pseudo-compressibility term in 



the continuity equation, namely the first term in Eq. (C.16), becomes propor- 
tional to e'^/up = 0.0036 ^ e^/z/ and this improves the quality of the solution. 
The vorticity at t = 1 is reported in Figure [T| where the velocity field looks 
much better, but the two spurious secondary vortices at the thinnest points 
of the two shear layers are still present (similarly to BGK-LBM). 

The second improvement follows the idea to increase the bulk viscosity ^, as 
suggested by Della r [16J. As discussed in Appendix D, instead of the standard 
ff^'^ (see Eq. (A.l) in the Appendix A) in Eq. (^7we consider a modified 



set of functions, namely //^*^ = fi^*\'y), where 7 is a free parameter related 
to the bulk viscosity ^: ^ = 2pQh' (1 + 27). This strategy enables to increase 
the bulk viscosity and represents a valuable example, showing how to use 
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1 from the link-wise ACM with the first im- 

0.04 and 



Fig. 1. Contours of vorticity at t 

provement (fp = 170 z/, see Section 2.1) on a 128 x 128 grid with Ma 
Re = 10000. See also Fig. 8 in [22]. 

moments of the updated distribution function for the local computation of 



derivatives (see the Appendix D for details). The above argument is a further 



confirmation that the present LW-ACM can easily incorporate technologies 
originally developed for LBM. 



As a concluding remark, it is worth to point out that, even though LW-ACM 
can solve this under-resolved test case by means of previously discussed im- 
provements, the average and peak values of the velocity field divergence remain 
slightly larger than those computed by MRT-LBM. For sake of completeness, 
optimized ACM [15] yields much smaller values of velocity field divergence 
than those of MRT-LBM. However, it is important to keep in mind that the 
Minion & Brown flow is a very severe test due to the small initial perturbation 
6, which realizes a very sharp boundary layer. Hence this test is a multi-scale 
problem and some of the regularity assumptions used in deriving the numerical 
schemes may not hold completely. 



2.6 Link-wise wall boundary conditions 



The link-wise formulation of the proposed method offers signiflcant advantages 
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Fig. 2. Contours of vorticity at t = 1 from the link-wise ACM with the both sug- 
gested improvements (fp = 170 z/, see Section 
128 X 128 grid with Ma = 0.04 and Re = 10000. 

when dealing with wall boundary conditions. First, let us consider simple 
structured boundaries, where walls are aligned along the mesh (for general 
cases, the reader can refer to the next section). In particular, let us suppose 
that walls are located halfway between two consecutive nodes in an ideal step- 
wise geometry. Instead of using Eqs. (|3]), it proves convenient focusing on the 
quantities /* (after pre-combining) streaming out of a single point x ( "push" 
formulation) 

/;(x,£) = #)(x,£) - 2 (^) /^"^(x,£). (14) 

Let us suppose that x is a fluid node close to a complex wall boundary at rest 
such that x+Vj is a wall node. In an ideal step-wise geometry, the wall location 
is assumed halfway between x and x + Vj. Hence, during the streaming step, 
the information stored in the discrete distribution function pointing towards 
the wall are reflected along the same link by the wall (according to the popular 
bounce-back rule), namely 

/rs(,)(x,t + i) = /;(x,t), (15) 

where BB{i) is the bounce-back operator giving the lattice link opposite to i- 



2.1, and 7 = 0.4, see SectionO) on a 
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th. Finally the post-combining step can be performed in the usual way, namely 



UJ- 1 



/eB(,)(x,t + 1) = /**^(,)(x,t + 1) + 2 (^^-— j (16) 

Considering that Vj = —'VBB(i) and //'^'°'' = —fBsli)-! combination of pre- 
vious steps yields 

+ 1) = /f Hx,t) + (2 - ^) 2/J4)(x,t). (17) 



In case of moving complex boundary with velocity u^„, the procedure reported 
in [21] (here reformulated in terms of fBB{i)) suggests the inclusion of the 
additional term 



SfBB{i){po, u^) = 2/^5% (Po, u^), (18) 

where f^'^li) is given by Eq. (2) and po is the averaged density over the whole 
computational domain (see Appendix [C] for details). For sake of simplicity, let 
us consider the diffusive scaling: Aa; = e ^ 1, At = e^. Hence, the incom- 
pressible limit implies p(x) = po + O(e^). Moreover, the point x is only e/2 
away from the moving wall. Combining the previous conditions, the following 
approximation holds u(x) = + O(e^) and 

c(e,o) +\ _ Ae,o) 



«i)(x,t) = /],i^?.)(/'0,uJ+O(6^), 



meaning that the rightmost term of Eq. (17) produces a similar effect to the 



correction (18). Hence, the suggested correction for LBM will be multiplied 
by a scaling factor (complement to one of the factor multiplying the last term 
in Eq. (|T7|)) in link- wise ACM, namely 



/]^B(,)(x,t + 1) = /BB(.)(i,t + 1) + - 1) <^/bb«(Po,u^), (19) 

where f^Bii) i^ proper boundary condition in case of moving boundary. 

Link-wise formulation is also very useful in computing hydrodynamical forces 
acting on bodies. A popular approach in LBM literature is based on the so- 
called momentum exchange algorithm (MEA), originally proposed in [2^ and 
lately improved in (see for a complete discussion). In LBM, at every 
time step, the momentum 



Pi = Vi/r' (X,t) - ^BB(i)fBB(i){^,t+ 1) = Vi [/r' (X,t) + fBB{i){^,t+ 1 

(20) 

is transferred from the fluid to the solid body {ff°'^^ is the post-collisional 
distribution function). For sake of simplicity, here we restrict ourselves on 
bodies at rest. In link-wise ACM, the quantity /f°^* is purposely defined in 
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order to get rid of the last term in the post-combining step, namely 



/r*(x,t) = /;(x,t)-2 



UJ 



fBB{i) 



Introducing the previous definition in Eq. (pOj) yields 



Pi 



/;(x,t) + (x,t + i 



(21) 



(22) 



The previous expression for link-wise ACM is general. In case of step-wise 



geometries (as those considered in this section), the expression (15) holds and 
Eq. (22) can be recast as: pi = 2vj/*(x, t). The force exerted on a body is 



computed by a summation of the contributions (22) over all the boundary 
links surrounding its surface: 



(23) 



where S is the set of links starting from all surrounding nodes intersecting the 
body. 



Conversion of the force (23) from lattice units to physical units requires sub- 



straction of the hydrostatic component J-q generated by the averaged density 
field po (see also the Appendix [C] for details). Computing the hydrostatic force 
on partial boundaries of a body by Eq. (23 and 22) can be accomplished after 



and f*BB{iy ioice 



exclusion of the velocity- dependent components of /, 

is computed on the entire body surface, the hydrostatic force is null. The re 
maining quantity J-" — J-q scales as the second order tensor: in case of diffusive 
scaling J-" — J-q ~ e^. However the number of points in the set S increases pro- 
portionally to 1/e. Consequently the following scaling holds (see also Table 





A^=(^- J-o)/e^ 



E Pi 

jeS(i/e) 



A- 



(24) 



2. 6. 1 2D lid driven cavity flow 

In this section, the effective enhanced stability of the present LW-ACM method 
is tested by means of the classical two-dimensional (2D) lid driven cavity 
problem (see also |33], [37]). Such a benchmark has been considered owing to 
a singularity of the pressure in the lid corners, which makes it a severe test 
for robustness of numerical schemes, especially starting from moderately high 
Reynolds numbers. In all simulations, we consider a square domain (x, y) G 
[0, L] X [0, L], with L = 1. Such a domain is discretized by uniform collocated 
grid with N x N points. The boundaries are located half-cell away from the 
computational nodes. Let us denote Xf, the generic boundary computational 
node. In all inner nodes (x 7^ x^), Eq. ([T| holds for any lattice velocity Vj. 
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In an arbitrary boundary node x;,, Eq. ([T]) holds for any lattice velocity Vj 
such that Xfo + Vj is still a computational node. In case x;, + Vj falls out of 
the computational grid, the boundary condition ( 19 ) is applied, with being 
the boundary velocity (imposed half-cell away from the boundary node Xb). 
In the following numerical simulations, u^^ = (^^^,0)^ at the lid wall, where 
Ml is the lid velocity, and = for all the other walls. At the lid corners, 
the lid velocity is imposed, while for other corners the boundary conditions 
(19) are adopted. 




Fig. 3. Streamlines for the lid driven cavity flow with Re = 5000. Different numerical 
methods and different meshes are compared. The location of the four minima of the 
stream-function is denoted by filled circles. 

In Figs. [3]and|4[ numerical results corresponding to several grids and methods 
are reported for Reynolds number Re = ulL/v = 5000. In this case, it is 
known that the flow is characterized by four main vortexes, whose actual 
centers can be found by searching for local extrema of the stream function ip 
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Fig. 4. Pressure contours for the lid driven cavity flow with Re = 5000. Different 
numerical methods and different meshes are compared. 

defined as: 

u=— v = (25) 
dy' 9x ' 

with u and v being the horizontal and vertical component of the velocity field, 
respectively. 

For standard Lattice Boltzmann method [9] with BGK collisional operator and 
D2Q9 lattice, the coarsest grid which ensures numerical stability was found 
to be 250 X 250. On the other hand, the present LW-ACM method can be 
safely adopted with 125 x 125 grid, and reasonably accurate results are found 
as reported in Figs. |3] and |4j In addition, LW-ACM shows stability even with 
50 X 50 grid (1/25 the total number of nodes needed by BGK-LBM method), 
where LW-ACM is still able to describe the main features of the 2D cavity 
flow at Re = 5000. 

In Fig. |4]the pressure contours for the same test are reported. As visible in the 
upper-left part of Fig. |4]the BGK solution shows some checkerboard pressure 
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Fig. 5. Comparison of the velocity field for the lid driven cavity flow at Re = 5000 
with 128 X 128 grids: horizontal component u and vertical component v are reported 
on the left and right hand side, respectively. Thin and bold lines denote the present 
LW-ACM (with Vp = v) and optimized ACM solution [13j, respectively. The lat- 
ter method is based on a second order accurate scheme in time, and fourth order 
accurate scheme in space (bulk fluid). 

distribution at the top corners of the cavity. The mesh resolution is still enough 
to overcome the checkerboard instability mechanism: however this comes at 
the price of a very large computational domain (larger than 250 x 250). On 
the other hand, no such a problem was noticed with LW-ACM, even for quite 
coarse grids (down to 50 x 50). The absence of spurious oscillations in the 
numerical solutions by the artificial compressibility method (ACM) for this 
test case has been already pointed out [13]. Hence, the previous numerical 
evidences demonstrate that also the present link-wise formulation of ACM 
inherits the same feature. 

It is worth stressing that numerical stability on coarse grids, yet with poor 
accuracy, is a highly desirable feature in several engineering problems, e.g. 
where a loose grid resolution of details of no interest (for the overall flow 
phenomena) should not prevent global convergence of the code. 

Clearly the numerical schemes should be compared in terms of the actual 
accuracy as well. From the very beginning, it is worth to point out that all 
considered methods (i.e. BGK-LBM, MRT-LBM, ACM, LW-ACM) are based 
on artificial compressibility and even steady state solutions depend on the 
numerical Mach number (in particular, the pressure gradients depend on the 
Mach number, as well as the number of time steps). In particular, reducing 
the numerical Mach number improves the quality of the results. See Appendix 
[Cjfor details. Hence a fair comparison among different methods requires using 
the same numerical Mach number. 
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Fig. 6. Comparison of the pressure field for the Ud driven cavity flow at Re = 5000 
with 128 X 128 grids. Thin and bold lines denote the present LW-ACM (with Vp = i/) 
and optimized ACM solution [13j, respectively. The latter method is based on a 
second order accurate scheme in time, and fourth order accurate scheme in space 
(bulk fluid). 

The flow flelds of a 2D lid-driven cavity problem with Re = 5000 and 128 x 128 
grid, as predicted by the optimized ACM method [13] and the present LW- 
ACM (with Mach number Ma = 0.2 and Up = p), have been compared. Here 
optimized ACM means that (a) high wave numbers are damped for the sup- 
pression of the checkerboard instability and (b) the Richardson extrapolation 
in the Mach number (except around top singular corners) is employed [13] . As 
reported in Figs. |5] and [6} LW-ACM shows both a smoother and more accurate 
behavior (see also Table Isj), beside a remarkably simpler implementation. 



Moreover, in Figs. [7] and |8| we report the flow flelds of a 2D lid-driven cavity 
problem with Re = 5000 and 256 x 256 grid as predicted by the optimized ACM 
[13] and the present LW-ACM (with Mach number Ma = 0.2 and Up = u), 
where a small mismatch between the two solutions is observed. This time the 
optimized ACM method is able to reproduce the reference results [j33j with 
excellent accuracy (despite the use of a much coarser grid: 256^ vs. 2048^), 
although minor pressure oscillations are still visible at the lid corners and this 
affects, e.g., the prediction of entrophy (see Eq. (26) and Table [s]). Differences 
between ACM and LW-ACM are mainly due to: 1) different accuracy of the two 
schemes (second and flrst order accuracy in time for ACM and the present LW- 
ACM respectively, whereas fourth and second order accuracy in space for ACM 
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Fig. 7. Comparison of the velocity field for the lid driven cavity flow at Re = 5000 
with 256 X 256 grids: horizontal component u and vertical component v are reported 
on the left and right hand side, respectively. Thin and bold lines denote the present 
LW-ACM (with Vp = v) and ACM solution [T^], respectively. The latter method is 
based on a second order accurate scheme in time, and fourth order accurate scheme 
in space. 



and the present LW-ACM respectively), and 2) slightly different treatment of 
boundaries. ACM imposes boundary conditions on the computational nodes, 
while in LW-ACM, analogously to LBM, the wall boundary conditions are 
imposed half cell away from the computational node. This allows one to avoid 
singularities, which appear in the top corners for this test case. Despite all 
this, it is fair to say that the agreement between the above two solutions is 
quite good. 



In Figs. [9] and [TOl the flow field computed by the LW-ACM with Re = 5000 
and 256 x 256 grid is compared to a reference solution from the literature [33] , 
where a state of the art code based on finite differences is used with a remark- 
ably fine grid (2048 x 2048). Despite a significant disparity in the number of 
computational nodes (LW-ACM makes use of 1/64 the nodes adopted for the 
reference solution), an excellent agreement is found. It is worth stressing that, 
the above problem was also simulated by the multiple relaxation time lattice 
Boltzmann method (MRT-LBM) with 256 x 256 grid. Comparison between 
MRT-LBM and the reference solution is also very good, however the issue of 
spurious pressure oscillations in the upper-left corner of the cavity could not be 
avoided. Further comparisons between the LW-ACM and other methods are 
proposed in Table |5] Here, coordinates of the primary vortex center {xp,yp), 
coordinates of the lower- right vortex center {xir,yir), total kinetic energy E 
and enstrophy Z are reported for several schemes and grids, where the latter 
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Fig. 8. Comparison of the pressure field for the lid driven cavity flow at Re = 5000 
with 256 X 256 grids. Thin and bold lines denote the present LW-ACM (with = z^) 
and ACM solution [13], respectively. The latter method is based on a second order 
accurate scheme in time, and fourth order accurate scheme in space. 



— 0.5 




— 0.5 




Fig. 9. Comparison of the velocity field for the lid driven cavity flow at Re = 5000: 
horizontal component u and vertical component v are reported on the left and right 
hand side, respectively. Thin and bold lines denote the present LW-ACM (256 x 256 
grid, with Vp = v) and the reference solution |33j (2048 x 2048 grid), respectively. 

two quantities are computed as follows: 



E = - 



\ f Wufdn^lAxAyY: 



(26) 
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Fig. 10. Comparison of the pressure field for the hd driven cavity flow at Re = 5000. 
Thin and bold lines denote the present LW-ACM (256 x 256 grid) and the reference 
solution ^ (2048 x 2048 grid), respectively. 

with uj = dxV — dyU, Ax and Ay being the vorticity and the grid spacings re- 
spectively. In our study, we notice that one consequence of spurious pressure 
oscillations in the solution of classical lattice Boltzmann schemes is that both 
BGK-LBM and MRT-LBM show remarkable inaccuracy in recovering the en- 
strophy value predicted by the reference ^33j, whereas the present LW-ACM 
overcomes the above issue. 

Finally, based on Figs. [3} |4| |5j [6| [7| |8| |9| [10} on comparisons of local and global 
quantities proposed in Table |5| we can conclude that LW-ACM represents an 
excellent alternative in terms of simplicity, stability and accuracy. 



Remark-In this study, towards the end of making an extensive comparison 
among state of the art INSE solvers, simulations are performed by different 
methods and grids as reported in Table [5j Since boundaries may be located 
differently for different methods (e.g. unlike ACM [13j where boundaries co- 
incide with computational nodes, in LW-ACM boundaries are half cell away 
from computational nodes), upon convergence, all the fields are first interpo- 
lated (by cubic spline interpolation) on a shifted grid (same size as the one 
used for fluid dynamic computations) having the boundaries located on the 
computational nodes. The values of global kinetic energy and enstrophy given 
by Eq. (26) are based on the latter shifted grids. 
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Table 5 

2D lid driven cavity flow at Re = 5000: Comparison between the present LW- 
ACM (with Mach number Ma = 0.2, v = Vp) and alternative solvers for INSE 
from literature [T31 [HI EOl [3Tj . In artificial compressibility methods (LW-ACM, 
BGK-LBM, MRT-LBM), even steady state solutions depend on the numerical Mach 
number: Hence a fair comparison among different methods requires using the same 
numerical Mach number (Ma = 0.2 in this case). ^This is the optimized version of 
the ACM method proposed in [13] where (a) high wave numbers are damped for 
the suppression of the checkerboard instability and (b) the Richardson extrapolation 
in the Mach number (except around top singular corners) is employed. * Owing to 
both the accuracy of the scheme and the size of meshes adopted, these results are 
considered as a reference for the present study. However, since enstrophy Z for 2D 
lid-driven cavity goes to infinity as the grid spacing goes to zero [33] , a meaningful 
comparison for Z is among similar grids. 



Scheme 


Grid 


\-^pi yp) 


{Xlr,yir) 


Energy 


Enstrophy 


Present 


128 X 128 


(0.51652,0.53754) 


(0.81081,0.079079) 


0.039845 


29.247 


m 


128 X 128 


(0.52052,0.53954) 


(0.82883,0.071071) 


0.027430 


41.249 


m 


128 X 128 


(0.51652,0.53854) 


(0.80981,0.072072) 


0.038371 


37.704 


BGK-LBM 


128 X 128 


unstable 


unstable 


unstable 


unstable 


MRT-LBM 


128 X 128 


(0.51652,0.53554) 


(0.80881,0.075075) 


0.043600 


37.404 


[33]* 


128 X 128 


(0.51562,0.53906) 


(0.80469,0.070313) 


0.043566 


30.861 


Present 


256 X 256 


(0.51552,0.53554) 


(0.80581,0.074074) 


0.044391 


34.821 


m 


256 X 256 


(0.51652,0.53654) 


(0.80881,0.072072) 


0.040896 


43.198 


m 


256 X 256 


(0.51451,0.53654) 


(0.80380,0.072072) 


0.048114 


42.290 


BGK-LBM 


250 X 250 


(0.51752,0.54054) 


(0.80781,0.074074) 


0.041614 


40.455 


MRT-LBM 


256 X 256 


(0.51552,0.53554) 


(0.80681,0.074074) 


0.045222 


40.833 


[33J* 


256 X 256 


(0.51562,0.53516) 


(0.80859,0.074219) 


0.046204 


34.368 


[33]* 


2048 X 2048 


(0.51465,0.53516) 


(0.80566,0.073242) 


0.047290 


40.261 



Moreover, if one performs the calculation of streamlines and vortex locations 
on the basis of the same nodes of the fluid dynamic grid, the final accuracy 
will depends on both the accuracy of the numerical solution and the grid it- 
self. Hence, results on coarse grids are penalized twice. Therefore, towards 
the end of computing the coordinates {xp,yp) and {xir,yir), all the hydrody- 
namic fields (as computed by the several schemes and different meshes) are 
first interpolated (by cubic spline interpolation) on a larger mesh, and there- 
after the stream function and the vortex locations are computed. The latter 
post-processing procedure is composed of the following subsequent steps: 1) 
interpolation of the results on fixed fine grid (1000^ in TableO; 2) computation 
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of the stream function and its local extrema denoting the vortex centers. 
2.6.2 3D diagonally driven cavity flow 










~ - - 

/.'''' 




/ ^ 


\. /mp 









lid 

direction 



* X 



Fig. 11. Cavity flow with the lid moving along its diagonal. Velocity components 
along axes x, y and z are denoted n, v and w respectively. 

One of the main advantage of the proposed link-wise formulation of ACM 
consists in its independence on the space dimensionality (as far as the con- 
sidered equilibrium satisfies the constraints required by the target equations: 
see Appendix [C] for details). As a result, the extension of LW-ACM to three- 
dimensional fiows is straightforward. In the following calculations, the D3Q19 
lattice [9] will be used: even though that is not a Hermitian lattice (such as 
D3Q27), the former lattice allows to satisfy the constraints required by the 
Navier-Stokes equations (in particular Eq. ( jC.lO )). 



Here, we have chosen the three-dimensional (3D) diagonally lid-driven cavity 
fiow, which is a classical benchmark for numerical solvers of the incompress- 
ible Navier-Stokes equations (see also [28l |30l ES])- The cavity is a cubic box 
with unit edge as schematically sketched in Fig. [TTJ The boundary condition 
at the top plane (x,?/, 1) is = (-\/2, -\/2, 0)/20 so that = ||ul|| = 1/10, 
whereas the remaining five walls are subject to no-slip boundary conditions. 
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The computational domain is discretized by a uniform collocated grid with A^^ 
nodes, with boundaries located half-cell away from the computational nodes. 
Towards the end of making a comparison with data from literature, calcula- 
tions have been performed by the LW-ACM at two Reynolds numbers studied 
in [281 Eg and [30] (Re = 700, Re = 2000), and two grids: iV = 48 and 
= 60. Let us denote the generic boundary computational node. In all 
inner computational nodes (x ^ x^), Eq. Q holds for any lattice velocity Vj. 

In this test case, numerical stability is significantly affected by boundary con- 
ditions. For that reason, in Ref . [30] , Authors suggest to use equilibrium-based 
boundary conditions for the sliding wall at relatively high Reynolds numbers 
on small computational grids. Nevertheless, as pointed out [30], this imple- 
mentation imposes an incorrect constant pressure at the boundary, with the 
momentum transfer significantly weakened in the direction perpendicular to 
the lid. Moreover, in Ref. [30], the "node" bounce-back boundary conditions 
are applied to the remaining five walls for imposing no-slip boundary condi- 
tions. Although such an approach reduces oscillations caused by the parity in- 
variance and thus enhances the numerical stability, the several simplifications 
discussed above were necessary to simulate the 3D cavity flow with Re = 2000, 
L'3Q15 lattice and 52^ grid. 

On the contrary, the present LW-ACM method does not need to resort to 
the above simplifications any longer. At an arbitrary boundary node x^ Eq. 



(19) holds, with u^^, being the boundary velocity (imposed half-cell away from 
the boundary computational node Xt„). This increases the accuracy in treat- 
ing the boundaries (compared to [30]) and, most importantly, it makes prob- 
lems in three-dimensions just a straightforward extension of the ones in two- 
dimensions (see previous section). 



In Figs. 12 , 13 and 14 we report a comparison between the velocity fields (in 



the MP, CP and PP planes of Fig. 11 ) by both the commercial code FLUENT 
(non-uniform 68^ grid) [2H1 120] , here considered as a reference, and the present 
LW-ACM (60'^ uniform grid) at Reynolds Re = 700: All the flow structures 
are correctly reproduced by LW-ACM. Moreover, based on a comparison of 



both local quantities in Fig. 15 and parallel/perpendicular global momenta 
Mil, Ml : 



Mil 
" 2 



Ml 



1 /" 1 

- y (m + vjfdV ^ -^AxAyAz ^ (wjj- fc + Wij- fc)^ , 
/ 1 '''' 2 (27) 

^ JV Z . 7. 



reported in Table [6} we can conclude that the present LW-ACM is indeed able 
to recover the reference solution with significant accuracy. 
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In Figs. 16 17 and 18 results of the 3D lid-driven cavity flow at higher Reynolds 
number, Re = 2000, are reported. It is worth stressing that, here all the main 
structures of the flow are correctly described by LW-ACM even with grids 
coarser than the one adopted in the reference solution [281 [29]. We stress that, 
describing secondary vortexes in this case is known to be a severe test for 
numerical schemes (in particular catching top-left and bottom-right secondary 
vortexes). 

For the sake of completeness, we also notice as the Reynolds number increases 
larger deviations of the LW-ACM solution from the reference are observed in 
terms of the parallel/perpendicular global momenta My, M± (see Table |6]). 



Finally, in Fig. 19 the numerical results obtained by both LW-ACM and the 
MRT-LBM |30] are shown. These two simulations are not perfectly compara- 
ble each other. In fact. Authors in Ref. [30] were forced by stability issues to 
implement some simpliflcations when dealing with the boundary conditions 
(mainly, equilibrium-based boundary conditions for imposing the lid velocity 
and "node" bounce-back for the no-slip boundary conditions). Those simpli- 



flcations reduce the accuracy with regards to that recovered by Eq. (19) 



Other minor difference is that [SU] and the present study were obtained by the 
D3Q15 lattice with 52'^ grid, and D3Q19 lattice with 48^ grid respectively. We 
notice that, in this case, MRT-LBM makes use of a larger number of degrees of 
freedom compared to LW-ACM: 52^ x 15 > 48^ x 19. In spite of this, it is quite 



clear by Fig. 19 that the pressure fleld recovered by the present LW-ACM is 
remarkably smoother than the one obtained by MRT-LBM. More speciflcally, 
a crucial difference is that LW-ACM predicts smooth pressure increase at 
the top-right corner, while the MRT-LBM results are affected by oscillations 
around the imposed constant pressure at the top plane. 

Table 6 

3D diagonally driven cavity: Volume integral of momentum flux. Comparisons are 
carried out between the present LW-ACM method and the reference solution in [28] 
adopting 60'^ and 68^ total number of grid nodes, respectively. 

Present, Re = 700 [281 EH], Re = 700 Present, Re = 2000 [281 [29], Re = 2000 
JyM\\ 0.203 x 10-1 0.216x10-1 0.134x10-1 0.163x10-1 

JyM_i_ 0.232 x 10-2 0.283 x 10-2 0.174x10-2 0.239 x 10-^ 



2.6.3 Circular Couette flow 

Dealing with moving complex boundaries is very important in many applica- 
tions: for example, particle suspensions, granular flows and active (bio-) agents 
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x[-] x[-l 



Fig. 12. Flow in the middle plane z = 0.5 of the 3D diagonally driven cavity (MP 
plane in Fig. 11) at Re = 700. Comparison between the LW-ACM with 60^ grid 



(left) and a reference solution obtained by the commercial code FLUENT (right) 
with 68^ total number of grid nodes. 

immersed in the flow. In these cases, the essential issue is to reduce as much 
as possible the computational demand by avoiding re-meshing every time that 
the considered objects move in the flow. Taking into account moving objects 
is also complicated by the need of re-initializing the portions of the flow field 
which are filled again by the fluid after the motion of the objects. The lat- 
ter feature is neglected here, because it is a general issue, not peculiar of the 
link-wise methods. 

First of all, we extend the wall boundary treatment discussed in the previous 
sections. Let us suppose that x is a fluid node close to a complex wall boundary 
at rest such that x + Vj is a wall node. Let us focus on the intersection between 
the i-th lattice link and the wall. The distance between the latter intersection 
and the fluid node, divided by the mesh spacing Ax, gives the normalized 
distance < g < 1 (above we considered only the case: q = 1/2). In this case, 
the streaming step can be performed following the same procedure provided 



for LBM in [23] (instead of using Eq. (16)), namely 



BB{i) 



X,t+1) 



2g/*(x,t) + (1 - 2g)/*(x - ev„£), q < 1/2, 



2q 



/^B.,)(x,t), g> 1/2, 



where BB{i) is the bounce-back operator giving the lattice link opposite to i- 
th. Finally the post-combining step can be performed in the usual way, namely 
by means of Eq. (16). 
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perpendicular to lid [-] 



Fig. 13. Flow in the plane perpendicular to the direction of the lid (CP plane in Fig. 
11) at Re = 700. Comparison between the LW-ACM with 60^ Cartesian grid (top) 
and a reference solution [28j obtained by the commercial code FLUENT (bottom) 
with 68^ total number of grid nodes. At the centerline, a stagnation point is observed 
at z = 0.68 (present), and z = 0.74 (FLUENT). 

In case of moving complex boundary with velocity u^„, the procedure reported 
in [21] suggests to consider an additional term, namely 



2/i'Bt)(Po,uJ, g<l/2. 



1 f{<^,o) ( 



q > 1/2, 



(2J 



where f^sli) given by Eq. (2 ) and po is the average value of the density over 
the whole computational domain (see Appendix [C| for details). Similarly to 
what we did in the previous sections, in case of diffusive scaling, the suggested 
correction for LBM will be multiplied by a scaling factor in link-wise ACM. 
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Fig. 14. Flow in the plane parallel to the direction of the lid (PP plane in Fig. [TT| ) 
at Re = 700. Comparison between the LW-ACM with 60^ Cartesian grid (top) and 
a reference solution |28j obtained by the commercial code FLUENT (bottom) with 
68^ total number of grid nodes. 



Hence fsBii) given by Eq. (19) is the proper boundary condition in case of 
moving complex boundary. 



In this section, numerical results are reported for the circular Couette flow, 
where a viscous fluid is confined in the gap between two concentric rotating 
cylinders. In our study, we assume the inner cylinder (with radius r^) at rest 
while the outer cylinder (with radius r^) rotates at a constant angular velocity 
l/rg. The latter flow admits the following exact solution: 
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Fig. 15. Velocity profile along the line ML (left) and the line RL (right) at Re = 700 
(see also Fig. 11). Comparison between the present LW-ACM with 60^ Cartesian 
grid and a reference solution [28] obtained by the commercial code FLUENT with 
68^ total number of grid nodes. 
Table 7 

The norm of the error versus e = Ax = Ma at t = 20 in the problem of the 
circular Couette flow for = 0.07. 



Link- wise ACM 



e = Ax Error 



e = Ax 
1/20 
1/40 
1/80 



Error [p] Error L-*^ [ 7~ ] 



1/20 1.53627 X 10"^ 8.81208 x 10""^ 
1/40 3.50537 x lO""^ 3.58432 x 10^"^ 
1/80 1.77257 X lO""^ 1.97650 x IQ-"^ 
1/160 3.42570 x 10"^ 6.16474 x 10"^ 

MRT-LBM 



Error L^[ug] 
4.66795 X 10-3 
1.52864 X 10-^ 



Error L^[p\ 
2.52316 X 10-3 
8.47929 X 10-"^ 



3.08607 X 10-"^ 2.99584 x lO""^ 



1/160 7.99695 x 10"^ 1.09817 x 10 



1-4 



u(t,r,e) = -C (- - -) sm(e), 
\ri r ) 

i;(t,r,^) = cf---)cos(e), 
\r,- r / 



p(t,r,^)=p, + C2ln(^-| 
r = 47rCz/ri, 



4.20562 X 10-"^ 
1.98687 X 10-^ 
9.13289 X 10^^ 
3.66160 X 10"^ 

Error L^[\T\] 
7.58091 X 10-5 
1.39351 X 10-"^ 
6.98541 X 10-5 
3.39760 X 10-5 



(29) 
(30) 

(31) 
(32) 
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Fig. 16. Flow in the middle plane z = 0.5 of the 3D diagonally driven cavity (MP 
plane in Fig. at Re = 2000. a) LW-ACM with 48^ uniform grid, b) LW-ACM 



with 60^ uniform grid, c) LW-ACM with 68^ uniform grid, d) Reference solution by 
the commercial code FLUENT with 68^ non- uniform grid [29j. 

where 

C = , ^ , . (33) 

Here, -u, -u, p and T denote horizontal velocity, vertical velocity, pressure and 
the torque on the inner cylinder respectively, with 9 being the angle between 
radial direction and the horizontal axis. A schematic representation of this 



setup is reported in Figure [20j Diffusive scaling is considered for this test 
case, where the velocity field is scaled on meshes with different sizes, keeping 
fixed the relaxation frequency (see Appendix [C] for details). The latter scaling 



ensures second order convergence in the accuracy, as reported in Table [7j 
Moreover, the torque exerted by the fluid on the inner cylinder is computed. 



To this end, Eq. (22) is applied in combination with the boundary conditions 



provided by (28 ). The computation of T was finally performed by a summation 
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0.5 1 
perpendicular to lid [-] 



Fig. 17. Flow in the plane perpendicular to the direction of the lid (CP plane in 
Fig. fTT| at Re = 2000 obtained by the LW-ACM with 60^ uniform grid. At the 



centerline, a stagnation point is observed ai z = 0.405. 



of the contributions (22) over all the boundary links around its surface, namely 



r = E(^-^c)xp„ (34) 

ieS 

where Xc is the center of the cylinders and S is the set of links starting from 
all nodes surrounding the body and intersecting the body itself. Similarly to 
what has been done for scaling the force exerted on a body, the above torque 
(assumed acting on the whole inner cylinder) must be converted from lattice 
units to physical units (see Table [l] for details). More specifically, the formula 
is the following 

r= (x-x,)xp„ (35) 

ieS(l/e) 

because |x — x^ ~ 1/e and this automatically takes into account the force 



scaling reported in Eq. (24) 



In Table [7], the numerical results for the circular Couette flow are reported. As 
expected, the numerical solution in terms of velocity and pressure shows al- 
most second order convergence rate. On the other hand, the modified MEA for 
link-wise ACM in computing T shows first order convergence rate (similarly 
to the original MEA for LBM). 
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Fig. 18. Flow in the plane parallel to the direction of the lid (PP plane in Fig. 11) 
at Re = 2000 obtained by the LW-ACM with 60^ uniform grid. 

It is important to highlight that, by taking into account the curvature cor- 
rectly (e.g. by finite- volume ACM using body- fitted cell system), the accuracy 
is dramatically improved. The reason is that, in link-wise ACM and in LBM, 
the boundary condition for curved wall is based on one dimensional inter- 
polation, but this method is not accurate for computing stresses (depending 
on local spatial derivatives). Hence for accurate solving boundary layers, the 
finite-volume ACM using body-fitted cell system is preferable. However, in 
the present paper, we used link-wise boundary conditions because they are 
extremely simple to be generalized in three dimensions and they have poten- 
tial when dealing with moving complex obejcts (e.g. particles) for avoiding 
re-meshing. Hence, which formulation to chose is a matter of the considered 
application. 



3 Conclusions 



In the present work, a novel method for low Mach number fiuid dynamic sim- 
ulations is proposed, taking inspiration from the best features of both the 
Lattice Boltzmann Method (LBM) and more classical computational fiuid 
dynamic (CFD) techniques such as the Artificial Compressibility Method 
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Fig. 19. Pressure contours for the diagonally driven cavity flow for Re = 2000 at 
the lateral mid-plane {y = 0.5): Solution obtained by present LW-ACM with 48^ 
uniform grid and D3Q19 lattice (left), and MRT-LBM method with 52^ uniform 
grid and D3Q15 lattice |30] (right). 

(ACM). The main advantage is the possibihty of exploiting well established 
technologies originally developed for LBM and classical CFD, with special 
emphasis on finite differences (at least in the present paper), at the cost of 
minor changes. For instance, like LBM, it is possible to use simple Cartesian 
structured meshes, eventually recursively refined in the vicinity of solid walls, 
and there is no need of solving Poisson equations for pressure. On the other 
hand, any boundary condition designed for finite difference schemes can be 
easily included. 

As far as solving incompressible Navier-Stokes equations - INSE - (by mini- 
mal amount of unknowns) is the only concern, the pseudo-kinetic heritages of 
LBM represent a severe limitation to several aspects such as designing flexible 
boundary conditions, introducing tunable forcing terms and analyzing consis- 
tency of the numerical scheme (asymptotics) . On the contrary, the suggested 
method has no such pseudo-kinetic heritages. Or in other words, following the 
standard LBM nomenclature, the present LW-ACM requires no high-order 
moments beyond hydrodynamics (often referred to as ghost moments) and no 
kinetic expansion such as Chapman-Enskog, Hilbert, van Kampen. Like finite 
difference schemes, only standard Taylor expansion is needed for analyzing 
consistency. Beside the above aspects, numerical evidences reported in this 
work suggest that LW-ACM represents an excellent alternative in terms of 
simplicity, stability and accuracy. Hence, in this framework (solving INSE by 
minimal amount of unknowns), the utility of high-order moments is question- 
able. 
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cylinder at rest rotating cylinder 



Fig. 20. Set-up of the circular Couctte flow, where one quarter of the domain is 
reported. For the sake of clarity, a significantly coarse grid is represented. 

Finally, preliminary efforts towards optimal implementations have shown that 
LW-ACM is capable of similar computational speed as optimized (BGK-) 
LBM. In addition, the memory demand is significantly smaller than (BGK-) 
LBM. In our opinion, there is still room for improvement according to the per- 
formance model (l:)ascd on assuming either infinitely fast memory or infinitely 
fast compute units). Importantly, with an efficient implementation, this algo- 
rithm may be one of the few which is compute-bound and not memory-bound. 
The latter observation is of particular interest for General-Purpose computing 
on Graphics Processing Units (GPGPU). 
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A Appendix: Equilibrium distribution functions 

The quantities f- ^^ are designed in order to recover the incompressible isother- 
mal fluid dynamics. For sake of completeness, we report here the explicit ex- 
pressions of the equilibrium functions for some popular lattices. 

The D2Q9 lattice [H], suitable for two dimensional problems {D = 2), consists 
of the following discrete velocities (Q = 9): vq = (0, 0), Vj = (±1, 0) and 
(0, ±1), for i = 1-4, and Vj = (±1, ±1), for i = 5-8, where the i-th equilibrium 
distribution function f^^^ reads 



/f ^ =Wip 1 + 3vi ■ u + - (vj ■ u)^ 



3 2 

-u 

2 



(A.l) 



with p the fluid density, and Wi the weights 



4/9 i 







Wi = < 1/9 



1-4, 



(A.2) 



1/36 i 



5-8. 



More explicitly, the complete set of equilibria takes the form: 




A/9 p - 2/3 pu^ -2/3 pv^, 
1/9 p+1/3 pu + 1/3 pu'^ - 1/6 pv^, 
l/9p+l/3pv + 1/3 pv"^ - 1/6 pw^ 
1/9 p-1/3 pu + 1/3 pu"^ - 1/6 pv^, 
l/9p-l/3pv + l/3pv^ - 1/6 pu^, 
l/36p+l/12p('u + v) + 1/8 p{u + - 1/2A p{u^ + v"^), 
l/36p-l/12p(M -v) + l/8p(-u + vf - 1/24p(m2 + v"^) 
l/36p-l/12p{u + v) + l/8p{-u-vf - l/2Ap{u^ + v'^) 
1/36 p+1/12 p{u -v) + 1/8 p{u - vf - 1/24 p{u'^ + v'^) 
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where u and v are the velocity components, i.e. {u, v)'^ = u, with pressure 
being p = p/3. The above equations (A.l) and ( |A.3 ) can be generahzed as 
follows 



p(l- 




)(1- 


^yy) ' 


p (n^x 




(1- 


^yy) 


/2, 


p (n^x 


— u] 
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^yy) 


/2, 






O^yy 


+ v) 


/2, 


p(i- 




O^yy 


-v) 


/2, 




+ u) 


O^yy 


+ v) 


/4, 




— u] 
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+ v) 
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p (n^x 


— u] 


O^yy 


-v) 


/4, 


p (n^'x 


+ u) 


O^yy 


-V) 


/4, 



(A.3) 



with the equation (A.l) being a special case of (A.3): If one assumes li^x = 
1/3 + and liyy = 1/3 + v\ then /(f) (1/3 + <T/3 + 1;^) = fi^) (if third 
order terms with respect to velocity components are neglected). However, it 
is possible to introduce more involved functions depending on additional pa- 
rameters. For instance, a quasi-equilibrium function which is useful for tuning 
bulk viscosity of both lattice Boltzmann and link-wise ACM schemes can be 
expressed as 



/('^^) (p,u,Tr) = / 



(9) 



2 ' 2 j ' 



(A.4) 



where Tr is an additional tunable parameter (usually corresponding to the 
trace of the second order tensor H = J2i ^i^ifi normalized by density (see 



also the Appendix D ) 



The D3Q19 lattice, which is suitable for three dimensional problems (Z^ = 3), 
consists of the following discrete velocities {Q = 19): vq = (0, 0, 0); Vj = 
(±1, 0, 0) and (0, ±1, 0) and (0, 0, ±1), for i = 1-6; = (±1, ±1, 0) and 
(±1, 0, ±1) and (0, ±1, ±1), for i = 7-18. Here, the the i-th function fP is 



formally identical to (A.l), with the following weights 



1/3 i = 0, 
1/18 i = 1-6, 
1/36 i = 7-n 



(A.5) 
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B Appendix: Physical derivation 



The Bohzmann equation is the ftindamental equation in kinetic theory of 
gases, describing time evolution of the distribution function of gas molecules as 
a function of time, space coordinates, and molecular velocity. The Bhatnagar- 
Gross-Krook (BGK) model equation inherits the main features of the original 
Boltzmann equation, with the fluid-dynamic description of the BGK solution 
for small Knudsen numbers being much simpler to obtain. Hence, owing to a 
remarkably less demanding effort, it come advantageous the employment of the 
BGK equation at the heart of kinetic methods for solving INSE. A well known 
drawback of the BGK equation is that the recovered Prandtl number is unity, 
while the original Boltzmann equation yields a value near to 2/3. However, 
since most of the LBM schemes do not consider the energy equation, the issue 
of inaccurate thermal diffusivity can be often neglected. At the same time, it 
is allowed to employ the isothermal BGK with a constant collision frequency 
for this purpose |2Qj. 

A crucial ingredient of any lattice Boltzmann scheme is a finite set of micro- 
scopic velocities, called lattice. The generic lattice velocity is identified by the 
subscript i, where < i < Q — 1. The LBM simulates the time evolution 
of a weakly compressible gas flow in nearly continuum regime by solving a 
kinetic equation on the lattice and yields the solution of the incompressible 
Navier-Stokes equation as its leading order. Hence, the relaxation frequency 
in the BGK equation can be expressed as a function of the kinematic viscosity 
u. The dimensionless form of the simplified BGK equation on a lattice takes 
the form 

f^+v.-v/r = i^(/i^)-/r), (B.l) 

where x, t, and Vj are the (dimensionless) space coordinates, time, and molec- 
ular velocity components, respectively; f-' is the distribution function of gas 
molecules for the i-th velocity on the lattice; f^^^ is the equilibrium distri- 
bution function. Let us suppose that u 1: then it is possible to find an 



approximated solution of (B.l) by singular regular expansion, where: 



/f = /f)-3z.v,■V/^(^) + 0(z.2). (B.2) 



Introducing the above approximation in the advection term of Eq. ( |B.l ), it 
yields 

^ = -V. ■ Vft^ + 3u (v. . Vyft^ + i- {ft^ - f!) + 0{v'). (B.3) 

Here, the goal is to derive an algorithm formulated in terms of only hydrody- 
namic quantities, i.e. the statistical macroscopic moments of f'^ corresponding 
to microscopic quantities conserved by the coUisional operator (right hand side 
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of Eq. (B.l)). In particular, the local equilibrium f^^^ is defined such that it 
has the same hydrodynamic quantities of Hence, as far as the computa- 
tion of the hydrodynamic quantities is concerned, the coUisional operator in 
(B.3) is unessential. Removing the latter term determines a modification in 
the model equation, though there is no effect on the hydrodynamic quantities. 
Let us define a new model equation by removing the coUisional term and ne- 
glecting terms 0(z/^) in (B.3), which can be re-formulated with respect to the 
new distribution function f- as follows: 



di 



2 .(e) 



(B.4) 



The above model equation (B.4) can be recast in the equivalent form 



di 



-Vi (v. ■ Vft'"^ - Vs/Vi (v. ■ V) 

(B.5) 

where the odd part of the equilibrium distribution function is defined by ([2|, 
while the even part is f^'^'^^ = f[ — fi^'°\ f]i = V2 = ^ and rj^ = rj^ = Su. 
Recalling that 

V. ■ Vft^^ - 2 (v. ■ V)V^°/^^ ^ ft^'\^ - v., i) - /^°/^Hx, t), (B.6) 

we modify once more the model equation by setting rji = Gu and 774 = 1/2 
(while other parameters remain unchanged, namely 772 = 1 and 773 = 3z/). By 
doing so, rj4/rj2 = tj^/tji = 1/2 which enables to use the approximation (B.6). 
By means of the above set of parameters, Eq. (B.5) becomes 



dfl 
di 



- {ft^'\^, i) - ft''^i± - v„ i)) - (#°)(x, i) - ft\± - v., i)] 

(B.7) 

As common in LBM, we apply the forward Euler rule for approximating first 
order time derivatives: 



(B.8) 

As far as the computation of hydrodynamic quantities is concerned, the first 
term of the right hand side in (B.8) can be substituted by f^^\'k,i) (they 
have same hydrodynamic moments). The novel model equation can thus be 
re-formulated in terms of the distribution function /j as follows 

/, (x, t+1) = ft^ (x, i) - (/f (x, i) - ft''^ (x - v„ i)) -Qu {ft'"^ (x, i) - ft'"^ (x - 

(B.9) 

or equivalently 



/,(x,t+ 1) = /f ^(x- v„t) + (1 - 6u) (/^°^(x,t) - ft'^k^ - v„£)) . (B.IO) 
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Eq. ([T]) can be recovered, upon substitution of ([T]) into (B.IO) 



C Appendix: Asympthotic analysis 



In ([l]), with X = x/Ax and i = t/At, both At and Ax must approach zero, 
though it is not clear the value of the limit: limAx^^o At/ Ax. In order to clarify 
this point, we apply Taylor expansion to ([T]) 



Mt + At) = /f ) - Ax V, . Vft^ + ^(v, . Vfft^ - ^(v, . Vfft^ + ... 



+ 2 



Axvi- V/, 



(e,o) 



Ax^ 



6 

\2 f(.e,o) 



Ax^ 



2 .v..V)Vr^ + ^(v,.V)Vi^'°^j+..., 

(c.i: 



where all the quantities are computed in the same point x and hence this is 
no more explicitly reported. A summation of Eqs. (C.l) over i yields: 



dp 
di 



Ax^ 
~At 



Ax^ 
2Ai 



(v-)'n('=) + o (^(v-)'n(^)^ ,(c.2) 



with Q*^^) = J2i ^i^i^ift^ ■ III the rightmost term of the above expression (C.2 ), 



we rely upon the fact that spatial derivatives of the fourth-order moments 
have the same growth rate of spatial derivatives of the second-order moments. 
Moreover, the diverge is raised to a power which is selected for consistency with 
the units of other terms. This is unessential, as far as the order of magnitude 



of the term is concerned. Multiplying (C.l ) by Vj and summing over i, it yields 



dt 



Ax 
At 



V ■ n(") 



Ax^ 



2)(v-W^) + o(^v-n(^)). (C.3) 



Similar considerations as Eq. (C.2) apply to the rightmost term in the above 



equation (C.3). Concerning the relationship between At and Ax, (C.3) sug- 



gests two possible strategies: 



At (X Ax, 
At oc Ax^ 



acoustic scaling, 
diffusive scaling. 



(C.4a) 
(C.4b) 



Sometimes, the dimensionless mesh spacing Ax = Ax'/ L is referred to as spac- 
ing (Ax = h) in the literature on finite difference method or even numerical 
Knudsen number (Ax = Kn) in the Lattice Boltzmann literature. Similarly, 



48 



it is possible to introduce a numerical Mach number: Ma = U/{Ax'/At'). In 
this way, the dimensionless time step At = At' /{L/U) can be expressed as 
At = KnMa. Hence, the acoustic scaling corresponds to constant Ma, while 
the diffusive scaling to Ma oc Ax. 

Regardless of the adopted strategy, the numerical scheme must converge to- 
wards the physical solution of the incompressible Navier-Stokes equations. The 
physical solution is identified by the Reynolds number, which is the recipro- 



cal of the factor multiplying the second-order spatial derivatives in Eq. (C.3), 
namely 

Ax' n i\ 1 

oc — — . (<^-5) 



At \uj 2j Re 
Hence, according to (C.5), in acoustic scaling u needs to be tuned in order 
to guarantee a constant Reynolds number, while in diffusive scaling a fixed u 
already ensures a constant Reynolds number. Moreover, in acoustic scaling, 
the smaller Ax the smaller u for keeping fixed the Reynolds number. The 
latter case can be problematic in the present LW-ACM due to the heuristic 
stability domain, 1 < w < 2, thus diffusive scaling is generally preferable. 

Similarly to standard LBM, in LW-ACM, the definition of At and Ax is 
implicit (and this is sometimes a source of confusion). In fact, the end-user 
can select both Kn oc 1/N (with the number of mesh points along the 
flow characteristic length) and Ma by a scaling factor for the velocity field. 
Hence, the acoustic scaling requires the tuning of the relaxation frequency 
u on different meshes, keeping constant the computed velocity field. On the 
other hand, the diffusive scaling corresponds to a scaling of the velocity field 
(see also the section below) on different meshes, keeping fixed the relaxation 
frequency. 



C.l Diffusive scaling 



Substituting Ax = e <^ 1 and At = into (C.3), it yields 

+ - V ■ n(^) = (---] (v-)^Q(^) + o (ev ■ n 



dt 



(C.6) 



Due to the presence of a mesh-dependent parameter in (C.6), upon conver- 
gence, all moments need to be scaled via the following post-process: 

u = (u - uo)/e, n(^) = (n(^) - no)/e^ Q(^) = (Q('=) - Qo)/e, (C.7) 

where uq. Ho and Qo are constants, with the odd quantities uq and Qo equal 
to zero. On the contrary, the even quantity Ho = PqI, where po is the average 
pressure over the whole computational domain (the average even moment II^^^ 
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over the whole computational domain depends mainly on the amount of mass 
and only slightly on the flow field). 

As a result, the normalized pressure field is defined as, p = {j> — po) / and the 
normalized density field as p = (p — po) / or equivalently p = po + p. Upon 



substitution of the latter quantities into (C.2), it follows 



V ■ u = 0{e^ 



{C.i 



Recalling that 11*^ "^^ = puu + pi and introducing the scaled quantities, we 
obtain 

Po^ + Pou ■ + = - 2) (^O'Q^^^ + 0{e'). (C.9) 
In order to recover the incompressible isothermal fluid dynamics, the quantities 
//^■* are designed [9] such that 



Qifk = ^ {uiSjk + Ujdik + Uk6ij] 



Consequently 



V . V ■ Q(^) = ^ V^u + ^ V V ■ u + 0{e') = ^V'u + 0{e') 



(C.IO) 



(C.ll) 



in Eq. (C.9), it yields 



where the last result is due to Eq. (C.8). Introducing the previous assumption 



— + u ■ Vu + — Vp = i/V^u + 0(e^). 
dt Po 



where 



1 /I 1 



3 \uj 



Introducing the previous expression into Eq. (C.2), it yields 

+ 6I/V • u = ^ V ■ fu ■ Vu + -Vp] + O(e^). 
padt 2 1 Po / 



(C.12) 



(C.13) 



(C.14) 



Taking into account the new definition given by (C.13), Eq. (C.8) should be 



and Eq. (C.12), it follows 



expressed more rigorously as V ■ u = 0{t^ /v). Combining the latter equation 

(C.15) 



1 



V ■ ( u ■ Vu + — Vp 1 = 0(e7z/). 



Introducing the above expression into Eq. (C.14), it yields 



dp 
Gpou dt 



(C.16) 
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The divergence-free condition for the velocity field requires that e^/v ^ 1, 
which is consistent with (C.12) as well. 



C.2 Acoustic scaling 



Assuming Ax = e ^ 1 and At = e, Eq. (C.3) yields 
d{pvL 



dt 



V.)2Q(-) + 0(e2). 



(C.17) 



This time, there is no need to scale all the moments and a proper tuning of u is 
sufficient instead (see below). Leaving the moments unsealed and substituting 



the above assumptions in Eq. (C.2), we obtain 

dp 



dt 



+ 



1 V-(pu) = 0(e). 



(C.18) 



The above equation (C.18) proves that acoustic scaling should not be used 



in link-wise ACM, when dealing with transient simulations, while for steady 
state flows we get: 

V-(pu)=0(6/z/), (C.19) 
showing that, if the density (pressure) gradients are small (consistently with 
the incompressible limit), the above equation provides indeed an accurate 
divergence-free velocity field. However, in acoustic (unlike diffusive scaling) 
the compressibility error cannot be reduced by mesh refinement. Taking into 



account Eq. (C.IO) and Eq. (C.19), we get 



V- V-Q(^) = -V'(pu) + 0(e/z/) 



hence 



dt 



+ V ■ (puu) + Vp = z^V^(pu) + 0(e) 



(C.20) 



(C.21) 



where z/ = e z/. If the density (and pressure) gradients are small, the solution 
to the system formed by (C.19) and ( |C.21 ) provides a reasonable approxima- 
tion of the Navier-Stokes solution in the incompressible limit. However, the 
latter system does not asymptotically converge towards the incompressible 
Navier-Stokes solution, as the mesh get finer and finer (at least, as far as the 
discretization error is smaller than the compressibility error) . This is the main 
reason why the diffusive scaling is preferred in this work. 



C. 3 Forcing 



Let us consider the forcing step described by Eq. (|4]), with Ax = e ^ 1, 
or equivalently Ma = Kn'^) where /5 is a free parameter (/3 = 



At 
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and /9 = 1 denote acoustic and diffusive scaling, respectively). The correction 
due to ^ leads to an additional term in (C.3): 



dt At At \u 



1 



(V.)2q(-) + 



'Ax' 
~At 



V ■ n^*^) + 



1 



At 
(C.22) 

From the definition At = KnMa and u = Mau (see the previous section), the 
proper scaling for the forcing term follows: 



1 



KnMa^ 



r2,g+i 



(C.23) 



A certain physical acceleration g (fixed for a given problem), can be imposed 
in the numerical code through the mesh- dependent acceleration g = e^'^"'"^ g. 



D Appendix: Computing derivatives locally 



LW-ACM allows a straightforward local computation of spatial derivatives. In 
this respect, a good example is provided by the following strategy for tuning 
bulk viscosity. Since the LW-ACM (like LBM and ACM) is an artificial com- 
pressibility scheme, bulk viscosity can be regarded as a free parameter (if the 
incompressible limit is the only concern). 

One possible strategy for tu ning the bulk viscosity is described below. Instead 
of standard fj;'^'' (see Eq. (A.l) in Appendix A) in Eq. (jl), we consider a 
modified set of functions, namely fi'^*\ defined as 



ft^ = fl"'^ (p, u, Tr(^) + 7 (Tr(+) - Tr^*^) 



(D.l) 



where /'■'^'^^ (p, u, Tr) is given by Eq. ( D.l[ ), 7 is a free parameter, Tr*-*^^ is the 
trace of the tensor 11'^'^) normalized by the density and Tr*^"*"-* is the trace of 
the tensor n^"*"^ = J2i^i^ifiit + At) normalized again by the density. Let us 
consider the two dimensional case [D = 2): by definition, Tr*-^-* = 2/3 + u^. 
Substituting the latte r int o Eq. (ID.I ) and taking into account the definition 

/ \ I I ' ' 

of fi given by Eq. (A.4), it is possible to compute the second order tensor 



of ft*\ namely 



7 



n(e*) = n("^ + ^ (Tr^+^ - Tr^"^) I. (D.2) 

At the leading order, the modified equilibrium differs from the standard one 
only due to even moments. 



In order to simplify the last term of the Eq. (D.2), by Eqs. (C.l ), we compute 
the following quantities 



n(+) = n(") - 6z/ Ax V ■ Q(') + o (Ax2(v-)^n('=)) 



(D.3) 
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where n^"*"^ = J2i^i^ifi{'t + At). Recalling the definition in (C.IO), it yields 



n(+) = n(^) - 2u Ax (yipu) + v{puf + v • (pu) i) + o [Ax\v-fn^''^ 

(D.4) 

with its trace taking the form: 



T/+) = Tr^^) - 8z/ Ax V ■ (pu) + O (Ax^iV-fn 



tie) 



(D.5) 



Considering the diffusive scaling, namely Ax = e ^ 1 and At = (see 
Appendix [C] for further details about the diffusive scaling), the previous ex- 
pression can be recast as: 



Tr 



(+) 



Tr^'^ -8poz^V-u + 0(e2). 



(D.6) 



Introducing the previous expression into Eq. (D.2) and applying the scaling 
to the remaining terms, it reads: 



Ylie*) = n^-^) - 4poz^7 V ■ ul + 0(e^ 



(D.7) 



Substituting n^^*) instead of 11'^'^) into Eq. (C.9) and taking into account Eq. 



dC.llD , it yields 
du 



Po^ + pou ■ Vu + Vp = V^u + ^ V V ■ u + 0(e' 



(D.8) 



where ^ = 2poz^ (1 + 2 7) is related to the bulk viscosity. The previous equation 
is consistent with Eq. (C.12), because the gradient of the divergence of the 
velocity field is as large as the leading error (hence it is not spoiling the 
consistency). However, the range 7 > is usually beneficial to numerical 
stability. This strategy enables to increase the bulk viscosity by using the 
updated distribution function for computing locally all required derivatives 
(involved in the divergence of the velocity field). 



E Appendix: Equivalent finite-difference formulas 



Here, we provide some finite-difference formulas fully equivalent to Eq. ([T]) for a 
chosen lattice. Let us consider the popular D2Q9 lattice ^ for two dimensional 
problems {D = 2), and consisting of nine discrete velocities {Q = 9). The 
quantities f-^^ can be explicitly defined for this lattice |9]. They allow to recover 
the incompressible Euler equations (with p = p/3), and they are consistent 
with the property given by (C.ll), which is essential for recovering Navier- 
Stokes equations. The numerical algorithm is fully defined, upon substitution 
of //''^ into the Eq. Q. 
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According to the finite-difference literature, we define the generic computa- 
tional stencil by means of cardinal directions. The generic point P with a 
position vector x = {n,m)'^ (the superscript T denotes transposition) is iden- 
tified by a pair of integers n and m. By means of the subscripts E and W, we 
denote the neighboring points {n ± 1,'m)'^, respectively. Similarly, by means 
of the subscripts and S, we mean the neighboring points (n, m ± 1)^, re- 
spectively. Two types of subscripts may be used concurrently for identifying 
the diagonal points. 



Concerning time levels, if not otherwise stated, all quantities are intended as 
computed at the generic time level t, with the superscript meaning a 
quantity at the new time level t + 1. The unknown quantities are given by 
the velocity components u = {u,v)'^ and the pressure p (the density for this 
model is given by p = 3p). Hence the equivalent finite-difference formulas 
must provide a way to compute Up, Vp and pp. 

Applying the definitions of hydrodynamic quantities to Eq. ([T|, it follows: 



Pp Up=pe{-Qu% + Que + 3f| - 2)/18 + Pwi^^w + ^"^w - 3f ^ + 2)/18 
-Pne{3une + ^uneVne - 3une + 3v%E - '^^NE + l)/36 
+PNw{'iu]^w - '^unwVnw + ^unw + 3f^vK - ^'"NW + l)/36 

-PSe{^uIe - QUseVsE - ^UsE + ^vIe + ^VsE + l)/36 

+Psw{^ulw + ^uswvsw + ^usw + 3t^|ty + 3vsw + l)/36 

+2/3{l/u - 1) [peUe - Sppup + pwuw 

+ {PneUne + PnwUnw + PseUse + PswUsw)/^ 

+ {PneVne - PnwVnw - PseVse + PswVsw)/4] , (E.l) 



Pp vt =Pn{^u% - 6vJ^ + 6vn - 2)/18 + PsiSu^ + 6vl + 6vs + 2)/18 
-Pne{3une + ^uneVne - 3une + 3v%E - '^^ne + 1)/36 
-Pw(3n^vy - ^unwVnw + ^unw + ^v^y^ - "ivj^w + l)/36 

^PSe{^u\e - QUSEVsE - ^USE + ^vIe + ^VsE + l)/36 

+Psw{^ulw + ^uswvsw + ^usw + ^vly^ + 3vsw + l)/36 

2/3(l/a; - 1) [pnVn - SppVp + psvs 

+ {PneUne - PnwUnw - PseUse + PswUsw)/4: 

+ {PneVne + PnwVnw + PseVse + PswVsw)/4] ■ (E.2) 
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p+ = -2pp(3ul + 3vl - 2) /9 

-Pe{-QuI + Que + 3^ - 2)/18 + Pw/(6m^ + Quw - 3?;^ + 2)/18 
-Piv(3M^ - Qvl + 6t;w - 2)/18 + P5(-3u| + 6t;| + Qvs + 2)/18 
+Pne{^u\e + QuneVne - 3une + 3t;^£; - 3t;Ar£; + l)/36 
+Pw(3M^vy - ^unwVnw + 3uArvi/ + 3t;^;^ - 3vnw + l)/36 
+P5£;(3m|e - 9useVse - ^use + 3vIe + 3vse + 1)/36 
+Psw{^ul^, + 9uswvsw + ^usw + ^vlw + 3vsw + l)/36 
+2/3(l/a; - 1) [-PeUe + PsVs + PwUw - PnVn 
+ {PswUsw + PnwUnw - PneUne - PseUse)/^ 
+ {pswvsw - PnwVnw - PneVne + PseVse)/^] , (E.3) 



It is important to stress that the above expressions ( |E.l ), (E.2), (E.3) for the 
considered lattice model are fully equivalent to ([T]) up to machine precision, 
since no asymptotic analysis is requested in their derivation. 



From a computational perspective, optimal implementation requires that the 
number of floating point operations are reduced as much as possible by com- 
mon subexpression elimination (CSE) [18]. Moreover, for locating the macro- 
scopic quantities (p, u, v) contiguously in the memory, it is possible to collect 
them in a single array and to use the first index for addressing them, namely 
M(l : 3, 1 : Nx, 1 : Ny) where Nx x Ny is the generic mesh. This leads to an 
optimized FD-style implementation. First of all, let us compute the following 
auxiliary quantities 



pup = 


M(l, 


,j)M(2, i,j), ^up 




pvp = 


M(l, 


,j)M(3, (fvp 


= '^pvp, 


PVN = 


M(l, 


,j + l)M(3,i,j + l), 


pvs = 


PUE = 


M(l, 


+ l,j)M(2,i + l,j), 


puw = 


PUNE = 


M(l, 


+ l,j + l)M(2,i + l, 




PUMW = 


M(l, 


-l,j + l)M(2,i-l, 




PUSE = 


M(l, 


+ l,j-l)M(2,i + l, 


J-1), 


pusw = 


M(l, 


-l,j-l)M(2,i-l, 


J-1), 


PVNE = 


M(l, 


+ l,j + l)M(3,i + l, 




PVNW = 


M(l, 


-l,j + l)M(3,i-l, 




PVSE = 


M(l, 


+ l,j-l)M(3,i + l, 


j-1). 


pvsw = 


M(l, 


-l,j-l)M(3,i-l, 


J-1), 



M(l,i,j-l)M(3,i,j-l), 
M(l,i-l,j)M(2,i-l,j) 



(E.4) 



and 
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=PUE (-M(2, i + 1, j) + 1) + M(l, i + 1, j) (M(3, i + 1, j)V2 - ri3), 
^,=puw (M(2, i - 1, j) + 1) - M(l, i - 1, j) (M(3, i - 1, j)V2 - ng), 



■>5 



:pvN (-M(3, i, j + 1) + 1) + M(l, i,j + 1) (M(2, i, j + 1)V2 - ri3 



^,=pvs (M(3, i, j - 1) + 1) - M(l, i, j - 1) (M(2, i, j - lf/2 - ng) 



■puNE (M(2, i + l,j + 1) + 3 M(3, i + 1, j + 1) - 1) + . . . 

■ ■ ■ + PVNE (M(3, i + 1, j + 1) - 1) + M(l, i + 1, j + 1) ri3, 
■■puNw (M(2, i - 1, j + 1) - 3 M(3, i - 1, j + 1) + 1) + . . . 

■ ■ ■ + pvNw (M(3, i - 1, j + 1) - 1) + M(l, i - 1, j + 1) ri3, 
:pM5£; (M(2, i + 1, j - 1) - 3 M(3, i + 1, j - 1) - 1) + . . . 

■ ■ ■ + pvsE (M(3, i + 1, j - 1) + 1) + M(l, i + 1, j - 1) ri3, 
■-pusw (M(2, i - IJ - 1) + 3 M(3, i - l,j - 1) + 1) + . . . 

■ ■ ■ + pvsw (M(3, i - l,j - 1) + 1) + M(l, i - 1, j - 1) ri3, (E.5) 



where ri3 = 1/3. By means of the quantities (E.4) and (E.5), it is possible to 
compute the following additional quantities 



(ppp 


= pup +pvp, 


4>Pm 


= pup - pvp, 


4>NEp 


= PUNE + PVNE, 


(pSWp 


= pusw+pvsw, 


4>NWm 


= PUNW - PVNW, 


4>SEm 


= PUSE - pVSE, 




= 4>NEp — 4>Pp, 




= 4>Pp ~ 4>SWp^ 


^NW 


= 4>NWm — (pPm, 




= 4>Pm — 4>SEm, 


^NWSE 


= ^NW — ^SE, 


^NESW 


= ^NE — ^SW- 



(E.6) 



Finally, auxiliary quantities are used in computing the updating formulas: 



- i{)Up M(2, i, j) - i{)Vp M(3, i, j) + 02 - 0i + 04 - 03 + 

+ 07 + 08)/4 - 64 ($W + ^SE - ^NE ' ^ Sw) + • • 

— TOlo/ -I- mii^i — T)Va\. 



p+ = M(l,i,j)r43 

■ ■ ■ + (05 + 

Vh {puE - puw + PVN - pvs), 

p+ = M+(l,i,j)=p+/3 
«+ = M+(2,i,j) = " 
a {puE 



'1 + 02 + (06 - 05 - 07 + 08)/4 + . . . 

Lpup + puw) - h i^NESw + ^nwse))/ pt^ 

= M+(3, i, j) = (03 + 04 + (07 + 08 - 05 - 06)/4 + . . . 



- a {pvN - <^vp + pvs) - 04 {^NESW - ^nwse))/Pp, 



(E.7) 
(E.8) 
(E.9) 



where r^^ = 4/3, 6 = 2 — 2/u, = a/4, 6 = 2 — 2/up and 64 = 6/4. 



The previous optimized formulas are consistent with Eqs. (E.l E.3) in case 
6 



a and 64 = 04. Kinematic viscosity is controlled via the parameter 
tu according to ([7|, whereas the parameter Up is responsible of the artificial 
compressibility. In particular, if Up 7^ u, the first term in Eq. (C.16) becomes 



proportional to e^/up, where Up is the value obtained by using Up in Eq. ([7]). 
Without additional computational costs, a proper choice of Up > u allows to 
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reduce the compressibility error in case of under-resolved simulations. 



F Appendix: Asymptotic analysis for energy equation 

Under the assumption of negligible viscous heating and conservation of internal 
energy, in the incompressible limit, the temperature field is governed by an 
advection-diffusion equation. Let us call T the normalized temperature field 
such that T < 1 in all the domain of interest, with the quantities f^^^ defined 
such that p = pT/3. 

There were a number of suggestions aiming at enabling thermal fluid-dynamic 
simulations with the lattice Boltzmann method |[32j. Among the most interest- 
ing ones, we remind: (a) Increase of the number of velocities and inclusion of 
higher-order nonlinear terms (in flow velocity) in the equilibrium distribution 
functions; (b) inclusion of finite difference corrections aiming at the fulfill- 
ment of energy conservation on standard lattices [M] and, (c) use of two sets 
of distribution functions for particle number density (/j), and energy density 
{gi), doubling the number of discrete velocities. Even though the first and the 
second approaches are preferable from the theoretical point of view, the last 
one is characterized by a much simpler implementation. When dealing with 
the incompressible limit, the pressure field is characterized by small variations. 
However, cases may be experienced where large temperature and density gra- 
dients compensate each other. If both pressure gradients and temperature gra- 
dients are small, a weak coupling between fluid dynamic and energy equations 
is realized and the simplified approach (c) discussed above can be adopted. 
This approach will be further extended below in the framework of the present 
LW-ACM. Extensions of the approach (b) in the framework of LW-ACM are 
currently under investigation as well. 

The LW-ACM approach for (weak) thermal fluid dynamic simulations makes 
use of the following system of algebraic equations 

^,(x,t + l) = ^f)(x-v„t) + 2 ['^) {gt\±,i)-gt^^\ic-^,,i)) , (F.l) 

ioY i = 0, . . . ,Qt — 1 (the number of lattice velocities for solving the thermal 
field can be different from that used for solving the velocity field, i.e. Qt ^ Q)- 
The quantities g^^'' are local functions of T = J2i 9i and Tu = Y.i '^i9i at the 
same point x and time i. The quantities gf^ are designed in order to recover the 
advection-diffusion equation, according to the constraints discussed below. In 
particular, recovering the advection-diffusion equation requires : Y.i of' = T 
and Ylii^idt' = Tu, i.e. the conservation of hydrodynamic moments, and 
J2i ^i^idt^ = n^'^'' = uu-l-T/3 1. On the other hand, the even parts of equilibria 
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gi'^^ are defined as 



9t''\T. u) = \ {gl'\T, u) + #(T, -u) j . (F.2) 



Eq. (F.l) is formulated in terms of x = x/Ax and t = t/ At. In the following, 
let us consider the diffusive scaling, namely Ax = e ^ 1 and At = (see the 
Appendix [C] for further details on diffusive scaling). Taylor expansion of Eq. 



(F.l) yields 



^.(t + e^) =^?f ^ - e V, . V^in |-(v, . V)^^?f ) 

+ (2 - ^) (^e V, . V^?f - ^(v, . V)\(^'^)) + . . . , (F.3) 

where all the quantities are computed in the same point x and time t and 
hence this is no more explicitly reported. Summation over i of the equations 



(F.3) yields 



with qI*^^ = I]i ViVjVj^ff^'*. In the rightmost term of the expression (IF. 41), we 



rely upon the fact that spatial derivatives of the fourth-order moments have 
the same growth rate of spatial derivatives of the second-order moments. More- 
over, in the same term, the diverge is raised to a power which is selected for 
consistency with the units of other terms. This is unessential, as far as the 
order of magnitude of the term is concerned. 



Similarly to the fluid dynamic equations (see the Appendix [C]), we apply the 
following post-processing for scaling all moments (arbitrary constants have 
been already omitted): 

u = u/6, nS^^ = ni^\ Ql^) = QS^Ve- (F.5) 

One possibility for automatic implementation of the latter scaling is to assume 
{Qt''')ijk = ^ {uiSjk + UjSik + UkSij) , (F.6) 



similarly to Eq. (CIO), because thus all terms in Q^*^^ become proportional to 
the velocity fleld (and they are automatically scaled by means of the flrst of 
the above scalings). 

Moreover, taking into account the second scaling and the definition of U^^^ 
reported in the main text, we conclude that = e^uu + T/3I, where T 



58 



does not need to be scaled (or equivalently T = T). Substituting the previous 
scahngs into (F.3), and taking into account ( C.8[ ), we obtain 




(F.7) 



where 



1/1 1 



a 



3 Koot 2 



(F.8) 



Table F.l 

Convergence analysis for thermal Couette flow in case of diffusive scaling, with 
Prandtl number Pr = 0.71. 







Link- 


wise ACM 






e = Ax 


Ma oc At /Ax 


u oc Re~^ 


a oc Pe~^ 


Error L'^[u] 


Error L^[T] 


1/5 
1/10 
1/20 


1.11 X 10-2 
5.55 X 10-2 
2.78 X 10-3 


5.55 X 10-2 
5.55 X 10-2 
5.55 X 10-2 


1.11 X 10-2 
1.11 X 10-2 
1.11 X 10-2 


3.10 X 10-3 
8.25 X 10-*^ 
2.12 X 10-^ 


1.61 X 10-6 
4.28 X 10-^ 
1.01 X 10-^ 



Here, a few numerical results are reported for the thermal Couette problem, 
which is realized by confining a viscous fluid in a gap between two parallel 
plates. Assuming that the one plate (hot wall located at y = and with 
temperature Tjv) moves in its own plane, whereas the other (cold wall located 
dX y = 2L and with temperature Ts) is at rest, the controlling parameters 
are the Prandtl number Pr = v/a (measuring the momentum diffusivity: 
z/, to heat diffusivity: a) and the Eckert number Ec = u^/c^AT (measuring 
the kinetic energy: pu^ to internal energy: pc^AT). Thermal Couette flow 
admits the following analytical solution of the temperature field: 

with AT= {Tn-Ts), and the Brinkman number Br = PrEc. 

Diffusive scaling was considered in our simulations, where the velocity field is 
scaled on meshes with different sizes, keeping fixed the relaxation frequency 
(see the Appen dix [C| for details). Some preliminary numerical results are re- 



ported in Table F.l for Ft = u/a = 0.71. 
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